| 1 |
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models |
提出MonitorBench,用于全面评估大语言模型中思维链的可监控性 |
large language model chain-of-thought |
|
|
| 2 |
PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision |
提出PReD:首个电磁领域多模态大模型,实现感知、识别与决策闭环 |
large language model foundation model multimodal |
|
|
| 3 |
CARV: A Diagnostic Benchmark for Compositional Analogical Reasoning in Multimodal LLMs |
CARV:多模态LLM中组合类比推理的诊断基准 |
large language model multimodal |
|
|
| 4 |
The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation |
提出框架效应以解决临床VLM评估中的多模态表现问题 |
multimodal |
|
|
| 5 |
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome |
MiroEval:面向多模态深度研究Agent的过程与结果评测基准 |
multimodal |
|
|
| 6 |
COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game |
COvolve:通过零和博弈对抗协同进化LLM生成策略与环境,实现开放式学习。 |
large language model |
|
|
| 7 |
Deep Research of Deep Research: From Transformer to Agent, From AI to AI for Science |
构建Transformer到Agent的演进路线图,探索AI在科学研究中的应用 |
large language model multimodal |
|
|
| 8 |
HeteroHub: An Applicable Data Management Framework for Heterogeneous Multi-Embodied Agent System |
HeteroHub:异构多具身智能体系统的数据管理框架 |
embodied AI multimodal |
|
|
| 9 |
A Multi-Agent Rhizomatic Pipeline for Non-Linear Literature Analysis |
提出基于多智能体Rhizomatic流程的非线性文献分析方法,突破传统线性综述局限。 |
large language model |
|
|
| 10 |
Beyond the Answer: Decoding the Behavior of LLMs as Scientific Reasoners |
利用GEPA优化提示词,揭示LLM在科学推理中的行为模式 |
large language model |
|
|
| 11 |
SAGAI-MID: A Generative AI-Driven Middleware for Dynamic Runtime Interoperability |
SAGAI-MID:利用生成式AI中间件实现动态运行时互操作性 |
large language model |
|
|
| 12 |
The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle |
提出AIGENIE框架以自动化心理测量量表开发流程 |
large language model |
|
|
| 13 |
Moving Beyond Review: Applying Language Models to Planning and Translation in Reflection |
Pensée:利用语言模型在反思写作的规划和翻译阶段提供支持,提升反思深度和质量。 |
large language model |
|
|
| 14 |
Coherent Without Grounding, Grounded Without Success: Observability and Epistemic Failure |
揭示大语言模型在可观测性差异下的能力与解释错位现象 |
large language model |
|
|
| 15 |
Evaluating LLMs for Answering Student Questions in Introductory Programming Courses |
评估LLM在编程入门课程中回答学生问题的能力,并提出评估框架。 |
large language model |
|
|
| 16 |
Reasoning as Energy Minimization over Structured Latent Trajectories |
提出基于能量最小化的结构化隐空间轨迹推理方法,解决单步解码和链式推理的不足。 |
chain-of-thought |
✅ |
|
| 17 |
EpiPersona: Persona Projection and Episode Coupling for Pluralistic Preference Modeling |
EpiPersona:通过人物角色投影和情景耦合建模多元偏好 |
large language model |
|
|
| 18 |
Designing AI for Real Users -- Accessibility Gaps in Retail AI Front-End |
零售AI前端易用性设计缺陷:忽略残障用户体验,提出前端保障机制 |
multimodal |
|
|
| 19 |
CoT2-Meta: Budgeted Metacognitive Control for Test-Time Reasoning |
CoT2-Meta:面向测试时推理的预算型元认知控制框架 |
chain-of-thought |
|
|
| 20 |
ViviDoc: Generating Interactive Documents through Human-Agent Collaboration |
ViviDoc:提出一种人机协作框架,用于生成可交互文档,降低创作成本。 |
large language model |
|
|
| 21 |
GEAKG: Generative Executable Algorithm Knowledge Graphs |
提出GEAKG:一种生成式可执行算法知识图谱,实现跨领域算法知识的表示、学习与迁移。 |
large language model |
|
|