| 1 |
On Path to Multimodal Historical Reasoning: HistBench and HistAgent |
提出HistBench和HistAgent以解决历史推理中的多模态挑战 |
generalist agent large language model multimodal |
|
|
| 2 |
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows |
提出ScienceBoard以评估多模态自主智能体在科学工作流中的应用 |
large language model multimodal |
✅ |
|
| 3 |
Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting |
提出Project Riley以解决情感推理的多模态对话问题 |
large language model multimodal |
|
|
| 4 |
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution |
提出Alita以解决现有智能体适应性不足的问题 |
generalist agent large language model |
✅ |
|
| 5 |
Large Language Models as Autonomous Spacecraft Operators in Kerbal Space Program |
提出基于大语言模型的自主航天器操作方案以解决卫星自主决策问题 |
large language model |
✅ |
|
| 6 |
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data |
提出BALSa框架以解决音频语言对齐问题 |
large language model instruction following |
|
|
| 7 |
Ten Principles of AI Agent Economics |
提出十项原则以解决AI代理经济学问题 |
multimodal |
|
|
| 8 |
Capability-Based Scaling Laws for LLM Red-Teaming |
提出能力差距框架以优化大型语言模型的红队测试 |
large language model |
|
|
| 9 |
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs |
提出StructEval以评估大型语言模型生成结构化输出的能力 |
large language model |
|
|
| 10 |
DCG-SQL: Enhancing In-Context Learning for Text-to-SQL with Deep Contextual Schema Link Graph |
提出DCG-SQL以解决Text-to-SQL性能不足问题 |
large language model |
✅ |
|
| 11 |
HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation |
提出HS-STaR以优化自学推理者的样本选择 |
large language model |
|
|