| 1 |
MEDSYN: Benchmarking Multi-EviDence SYNthesis in Complex Clinical Cases for Multimodal Large Language Models |
MEDSYN:多模态大语言模型在复杂临床病例中多证据综合的基准测试 |
large language model multimodal |
|
|
| 2 |
FewMMBench: A Benchmark for Multimodal Few-Shot Learning |
提出FewMMBench,用于评估多模态大语言模型的小样本学习能力 |
large language model multimodal chain-of-thought |
✅ |
|
| 3 |
Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models |
通过扰动任务和思维链推理评估大型语言模型中的心理理论能力 |
large language model chain-of-thought |
|
|
| 4 |
Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion |
提出基于语音-文本融合的可扩展多语言多模态机器翻译框架,显著提升翻译质量。 |
large language model multimodal |
✅ |
|
| 5 |
Dynamic Personality Adaptation in Large Language Models via State Machines |
提出基于状态机的大语言模型动态人格适应框架,用于改善人机交互 |
large language model |
|
|
| 6 |
Sparsity Induction for Accurate Post-Training Pruning of Large Language Models |
提出稀疏性诱导方法,提升大语言模型后训练剪枝的准确性 |
large language model |
|
|
| 7 |
IndicIFEval: A Benchmark for Verifiable Instruction-Following Evaluation in 14 Indic Languages |
IndicIFEval:面向14种印度语言的可验证指令遵循评估基准 |
instruction following |
✅ |
|
| 8 |
Large Language Models are Algorithmically Blind |
揭示大语言模型在算法推理上的局限性:算法盲区 |
large language model |
|
|
| 9 |
Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text |
在Hinglish混合语境下,领域微调DistilBERT胜过大型语言模型,用于反讽检测 |
large language model |
|
|
| 10 |
Personalized Graph-Empowered Large Language Model for Proactive Information Access |
提出基于个性化图谱增强的大语言模型,用于主动信息访问以辅助记忆 |
large language model |
|
|
| 11 |
Evaluating the Usage of African-American Vernacular English in Large Language Models |
评估大型语言模型中非裔美国人白话英语的使用情况,揭示其不足与刻板印象。 |
large language model |
|
|
| 12 |
When More Is Less: A Systematic Analysis of Spatial and Commonsense Information for Visual Spatial Reasoning |
针对视觉空间推理,分析空间信息和常识信息注入策略的有效性 |
multimodal chain-of-thought |
|
|
| 13 |
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets |
提出T-RANK框架,提升多语言LLM评测基准翻译质量,解决语义漂移问题。 |
large language model |
|
|
| 14 |
Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference |
提出置信度驱动的多尺度模型选择策略,实现成本效益高的LLM推理 |
large language model |
|
|
| 15 |
CxMP: A Linguistic Minimal-Pair Benchmark for Evaluating Constructional Understanding in Language Models |
提出CxMP基准以评估语言模型的构式理解能力 |
large language model |
|
|
| 16 |
Improving Implicit Discourse Relation Recognition with Natural Language Explanations from LLMs |
利用大语言模型解释增强隐式篇章关系识别 |
large language model |
|
|