| 1 |
On Path to Multimodal Historical Reasoning: HistBench and HistAgent |
提出HistBench历史推理基准和HistAgent,提升AI在历史领域的多模态理解能力。 |
generalist agent large language model multimodal |
|
|
| 2 |
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows |
ScienceBoard:构建多模态自主Agent的科学工作流评估基准 |
large language model multimodal |
✅ |
|
| 3 |
Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting |
Project Riley:提出一种基于情感推理和投票的多模态多智能体LLM协作框架 |
large language model multimodal |
|
|
| 4 |
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution |
提出Alita以解决现有智能体适应性不足的问题 |
generalist agent large language model |
✅ |
|
| 5 |
Large Language Models as Autonomous Spacecraft Operators in Kerbal Space Program |
利用大型语言模型作为Kerbal太空计划中的自主航天器操作员 |
large language model |
✅ |
|
| 6 |
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data |
BALSa:利用合成数据引导音频-语言对齐,提升ALLM性能并缓解幻觉问题 |
large language model instruction following |
|
|
| 7 |
Ten Principles of AI Agent Economics |
提出AI Agent经济学十大原则,旨在负责任地将AI Agent整合到人类社会经济系统中。 |
multimodal |
|
|
| 8 |
Capability-Based Scaling Laws for LLM Red-Teaming |
提出基于能力的LLM红队攻防扩展法则,预测攻击成功率 |
large language model |
|
|
| 9 |
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs |
StructEval:全面评估LLM生成结构化输出能力的基准测试 |
large language model |
|
|
| 10 |
DCG-SQL: Enhancing In-Context Learning for Text-to-SQL with Deep Contextual Schema Link Graph |
提出DCG-SQL,通过深度上下文Schema链接图增强Text-to-SQL的上下文学习能力 |
large language model |
✅ |
|
| 11 |
HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation |
提出HS-STaR,通过分层采样提升自训练推理器在数学问题上的学习效率 |
large language model |
|
|