| 1 |
MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems |
MedMASLab:用于多模态医学多智能体系统基准测试的统一编排框架 |
multimodal visual grounding |
✅ |
|
| 2 |
Quantifying the Necessity of Chain of Thought through Opaque Serial Depth |
通过不透明串行深度量化思维链的必要性 |
large language model chain-of-thought |
|
|
| 3 |
World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models |
World2Mind:用于具身智能体认知空间推理的工具包 |
foundation model multimodal |
|
|
| 4 |
GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models |
GenePlan:利用大语言模型进化生成更优的通用PDDL规划器 |
large language model chain-of-thought |
|
|
| 5 |
EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages |
EsoLang-Bench:通过冷门编程语言评估大语言模型的真正推理能力 |
large language model |
|
|
| 6 |
Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People |
利用大型语言模型为视障人士打造可交互的虚拟现实辅助工具 |
large language model |
|
|
| 7 |
PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs |
提出PathMem以解决病理MLLMs知识整合不足问题 |
large language model multimodal |
|
|
| 8 |
Think Before You Lie: How Reasoning Improves Honesty |
推理提升大语言模型诚实度:揭示表征空间几何与道德决策的关系 |
large language model |
|
|
| 9 |
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness |
提出RAISE框架,揭示逻辑推理能力提升如何驱动AI系统涌现情境感知能力 |
large language model |
|
|
| 10 |
MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations |
MITRA:用于物理合作中知识检索的AI助手,解决信息爆炸难题 |
large language model |
|
|
| 11 |
Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT |
提出CVS:一种免训练的视觉-语言SFT数据选择方法,提升多模态推理能力。 |
multimodal |
|
|
| 12 |
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models |
MUGEN:评估并提升大语音语言模型的多语音理解能力 |
chain-of-thought |
|
|
| 13 |
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants |
提出MiniAppBench,评估LLM驱动助手中从文本到交互式HTML响应的转变 |
large language model |
|
|
| 14 |
Curveball Steering: The Right Direction To Steer Isn't Always Linear |
提出Curveball Steering,通过非线性干预提升大语言模型行为控制效果 |
large language model |
|
|
| 15 |
Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness |
提出BD-FDG框架,用于领域自适应LLM在空间态势感知中的应用 |
large language model |
|
|
| 16 |
Real-Time Trust Verification for Safe Agentic Actions using TrustBench |
TrustBench:用于Agent安全行动的实时信任验证框架 |
large language model |
|
|
| 17 |
Deep Tabular Research via Continual Experience-Driven Execution |
提出基于持续经验驱动执行的深度表格研究框架,解决复杂表格推理难题 |
large language model |
|
|