| 1 |
MLLM-CBench:A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis |
提出MLLM-CTBench,用于多模态LLM持续指令微调的综合基准测试,并分析思维链推理能力。 |
large language model multimodal chain-of-thought |
|
|
| 2 |
MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models |
提出MPCC:一个用于评估多模态大语言模型在复杂约束下多模态规划能力的新基准。 |
large language model multimodal |
|
|
| 3 |
Enabling Few-Shot Alzheimer's Disease Diagnosis on Biomarker Data with Tabular LLMs |
TAP-GPT:利用表格LLM实现基于生物标志物数据的少样本阿尔茨海默病诊断 |
large language model foundation model multimodal |
|
|
| 4 |
PhysicsEval: Inference-Time Techniques to Improve the Reasoning Proficiency of Large Language Models on Physics Problems |
提出PhysicsEval以提升大语言模型在物理问题上的推理能力 |
large language model |
✅ |
|
| 5 |
Comparison of Large Language Models for Deployment Requirements |
对比分析大型语言模型部署需求,为研究者和企业提供选型参考。 |
large language model |
|
|
| 6 |
DiffLoRA: Differential Low-Rank Adapters for Large Language Models |
DiffLoRA:一种用于大型语言模型的差分低秩适配器,旨在提升Transformer模型的性能。 |
large language model |
|
|
| 7 |
Unveiling Super Experts in Mixture-of-Experts Large Language Models |
揭示MoE大语言模型中的超级专家,发现其对模型性能的关键作用 |
large language model |
✅ |
|
| 8 |
Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges |
综述:大型语言模型在表格数据理解中的最新进展与挑战 |
large language model multimodal |
|
|
| 9 |
Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs |
Rule2Text:利用大语言模型为知识图谱中的逻辑规则生成自然语言解释 |
large language model chain-of-thought |
✅ |
|
| 10 |
From Image Captioning to Visual Storytelling |
提出一种基于图像描述到视觉故事生成的框架,提升故事连贯性和训练效率。 |
multimodal |
|
|
| 11 |
SWE-Exp: Experience-Driven Software Issue Resolution |
SWE-Exp:提出经验驱动的软件问题解决框架,提升代码修复成功率 |
large language model |
|
|
| 12 |
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution |
提出SWE-Debate,通过多智能体辩论解决软件问题,实现更有效的代码修复 |
large language model |
|
|
| 13 |
Evaluating LLMs' Multilingual Capabilities for Bengali: Benchmark Creation and Performance Analysis |
构建孟加拉语LLM基准,评估并分析现有模型的多语言能力 |
large language model |
✅ |
|
| 14 |
GanitBench: A bi-lingual benchmark for evaluating mathematical reasoning in Vision Language Models |
GanitBench:一个用于评估视觉语言模型数学推理能力的双语基准测试集 |
chain-of-thought |
|
|
| 15 |
Role-Aware Language Models for Secure and Contextualized Access Control in Organizations |
提出角色感知语言模型,用于企业环境中安全且上下文相关的访问控制 |
large language model |
|
|
| 16 |
Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models |
Causal2Vec:提升Decoder-only LLM作为通用嵌入模型的性能 |
large language model |
|
|
| 17 |
Text-to-SQL Task-oriented Dialogue Ontology Construction |
提出TeQoDO:利用LLM的SQL能力自动构建面向Text-to-SQL任务的对话本体 |
large language model |
|
|
| 18 |
Semantic Compression for Word and Sentence Embeddings using Discrete Wavelet Transform |
提出基于离散小波变换的语义压缩方法,用于压缩词和句子嵌入。 |
large language model |
|
|
| 19 |
Do LLMs produce texts with "human-like" lexical diversity? |
研究表明大型语言模型生成的文本在词汇多样性方面与人类写作存在显著差异 |
large language model |
|
|
| 20 |
T-Detect: Tail-Aware Statistical Normalization for Robust Detection of Adversarial Machine-Generated Text |
T-Detect:利用尾部感知统计归一化方法,提升对抗攻击下机器生成文本的鲁棒检测能力 |
large language model |
✅ |
|
| 21 |
MRGSEM-Sum: An Unsupervised Multi-document Summarization Framework based on Multi-Relational Graphs and Structural Entropy Minimization |
提出MRGSEM-Sum框架,利用多关系图和结构熵最小化实现无监督多文档摘要。 |
large language model |
|
|
| 22 |
What's Taboo for You? - An Empirical Evaluation of LLMs Behavior Toward Sensitive Content |
评估LLM对敏感内容的隐式内容审核:GPT-4o-mini的案例研究 |
large language model |
|
|
| 23 |
Failures Are the Stepping Stones to Success: Enhancing Few-Shot In-Context Learning by Leveraging Negative Samples |
利用负样本提升少样本上下文学习(ICL)性能 |
large language model |
|
|