| 1 |
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs |
提出KnowRecall和VisRecall基准以解决多语言一致性问题 |
large language model multimodal |
|
|
| 2 |
Are LLMs reliable? An exploration of the reliability of large language models in clinical note generation |
评估大型语言模型在临床笔记生成中的可靠性 |
large language model |
|
|
| 3 |
Can Large Language Models Understand Internet Buzzwords Through User-Generated Content |
提出CHEER数据集与RESS方法以提升大语言模型对网络流行词的理解能力 |
large language model |
✅ |
|
| 4 |
Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory |
提出PSN-IRT框架以提升大语言模型基准评估的有效性 |
large language model |
|
|
| 5 |
Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models |
提出DLISC以解决资源受限设备上的信息提取问题 |
large language model |
|
|
| 6 |
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective |
提出扩散语言模型以解决自回归模型在文本嵌入中的局限性 |
large language model instruction following |
|
|
| 7 |
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems |
提出Spoken-MQA基准以评估语音模型的数学推理能力 |
large language model multimodal |
|
|
| 8 |
MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision |
提出MAS-ZERO以解决多智能体系统设计中的监督依赖问题 |
large language model |
|
|