| 1 |
Multimodal Fact-Level Attribution for Verifiable Reasoning |
提出MuRGAt基准,评估多模态LLM在复杂推理中基于事实的归因能力 |
large language model multimodal |
|
|
| 2 |
Visual Reasoning Benchmark: Evaluating Multimodal LLMs on Classroom-Authentic Visual Problems from Primary Education |
提出视觉推理基准VRB,评估多模态LLM在小学教育场景下的视觉问题解决能力。 |
large language model multimodal |
|
|
| 3 |
Disentangling Ambiguity from Instability in Large Language Models: A Clinical Text-to-SQL Case Study |
提出CLUES框架,用于临床Text-to-SQL中区分歧义性和不稳定性,提升错误预测。 |
large language model |
|
|
| 4 |
Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models |
MiLMMT-46:通过模型和数据扩展,利用开放大语言模型提升多语言机器翻译性能 |
large language model |
|
|
| 5 |
Do Large Language Models Adapt to Language Variation across Socioeconomic Status? |
研究表明大型语言模型在不同社会经济地位人群的语言变体适应性方面存在不足。 |
large language model |
|
|
| 6 |
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm |
提出SPES框架,解决MoE LLM在低显存GPU上的分布式预训练难题 |
large language model |
✅ |
|
| 7 |
DeepSight: An All-in-One LM Safety Toolkit |
DeepSight:一个一体化的大模型安全工具包,集成评估与诊断。 |
large language model multimodal |
|
|
| 8 |
Thinking with Drafting: Optical Decompression via Logical Reconstruction |
提出光学解压方法以解决复杂推理任务中的精度悖论 |
large language model multimodal |
|
|
| 9 |
DMAP: A Distribution Map for Text |
提出DMAP:一种基于分布映射的文本分析方法,用于解决现有方法在文本分析中对上下文考虑不足的问题。 |
large language model |
|
|
| 10 |
dVoting: Fast Voting for dLLMs |
dVoting:一种加速扩散大语言模型推理的快速投票技术,无需训练即可提升性能。 |
large language model |
✅ |
|
| 11 |
AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection |
AdaptEvolve:通过自适应模型选择提升进化AI代理的效率 |
large language model |
✅ |
|
| 12 |
Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems |
提出RouterXBench评估框架与ProbeDirichlet路由方法,提升协同LLM系统中路由器的公平性和全面性 |
large language model |
|
|
| 13 |
PatientHub: A Unified Framework for Patient Simulation |
PatientHub:用于患者模拟的统一框架,促进方法标准化与可复现性。 |
large language model |
✅ |
|
| 14 |
Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation |
提出检测压缩令牌表示中的溢出以改善生成检索性能 |
large language model |
|
|
| 15 |
Query-focused and Memory-aware Reranker for Long Context Processing |
提出一种查询聚焦且具备记忆能力的重排序框架,用于处理长文本上下文。 |
large language model |
|
|
| 16 |
DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling |
DHPLT:大规模多语种历时语料库及词表示,用于语义变化建模 |
TAMP |
|
|
| 17 |
Benchmark Illusion: Disagreement among LLMs and Its Scientific Consequences |
揭示大语言模型基准测试的幻觉:模型间存在显著分歧,影响科学研究可复现性 |
large language model |
|
|
| 18 |
More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles |
提出弱水印集成方法,解决强水印降低token分布熵,削弱多层水印效果的问题。 |
large language model |
|
|
| 19 |
Scene-Aware Memory Discrimination: Deciding Which Personal Knowledge Stays |
提出场景感知记忆判别方法SAMD,解决LLM在个人知识管理中信息过滤和计算成本问题。 |
large language model |
|
|