| 1 |
Predicting Power-System Dynamic Trajectories with Foundation Models |
提出LASS-ODE-Power,利用大规模预训练预测电力系统动态轨迹。 |
foundation model |
|
|
| 2 |
Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models |
单细胞Foundation模型中,中间层编码了最优的生物学表征,超越了传统末层特征提取方法。 |
foundation model |
|
|
| 3 |
Dissecting Failure Dynamics in Large Language Model Reasoning |
提出GUARD框架,通过不确定性信号探测并纠正大语言模型推理过程中的早期错误。 |
large language model |
|
|
| 4 |
Rethinking Patient Education as Multi-turn Multi-modal Interaction |
提出MedImageEdu基准,用于评估多模态交互式患者教育智能体 |
multimodal visual grounding |
|
|
| 5 |
MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror |
MirrorBench:通过引入镜像评估多模态大语言模型中的自我中心智能 |
large language model multimodal |
✅ |
|
| 6 |
DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation |
提出DR³-Eval:一个用于评估深度研究Agent的现实且可复现的基准 |
multimodal instruction following |
|
|
| 7 |
Context Over Content: Exposing Evaluation Faking in Automated Judges |
揭示LLM评估中的情境偏见:下游影响信息会扭曲评估结果 |
chain-of-thought |
|
|
| 8 |
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime |
AIPC:基于Agent的AI模型自动化部署框架,加速高通AI Runtime部署。 |
multimodal |
|
|
| 9 |
VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs |
提出VeriGraphi框架,解决LLM生成大型分层硬件设计Verilog代码的难题 |
large language model |
|
|
| 10 |
Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines |
Scepsy:利用聚合LLM流水线服务Agentic工作流,提升吞吐量并降低延迟 |
large language model |
|
|
| 11 |
Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC |
提出自我进化的逻辑综合框架以提升EDA工具性能 |
large language model |
|
|
| 12 |
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench |
提出ProVoice-Bench,用于评估主动式语音代理,填补现有基准测试的空白。 |
multimodal |
|
|
| 13 |
Dr.~RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement |
Dr.~RTL:通过工具驱动的自主Agent持续优化RTL设计 |
large language model |
|
|
| 14 |
Discovering Novel LLM Experts via Task-Capability Coevolution |
提出AC/DC框架,通过任务-能力协同进化发现具备新技能且更高效的LLM。 |
large language model |
|
|
| 15 |
Governing Reflective Human-AI Collaboration: A Framework for Epistemic Scaffolding and Traceable Reasoning |
提出一种人机协作框架,通过知识支架和可追溯推理提升AI治理能力 |
large language model |
|
|
| 16 |
MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration |
MemoSight:融合上下文压缩与多Token预测,加速LLM推理 |
chain-of-thought |
|
|
| 17 |
The Missing Knowledge Layer in AI: A Framework for Stable Human-AI Reasoning |
提出稳定人机推理框架,解决大语言模型推理漂移问题 |
large language model |
|
|
| 18 |
The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows |
揭示LLM谬误:AI辅助认知工作流中的能力误判现象 |
large language model |
|
|
| 19 |
Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution |
提出面向企业AI的Bounded Autonomy架构,保障LLM安全执行并提升效率。 |
large language model |
|
|
| 20 |
The Agentification of Scientific Research: A Physicist's Perspective |
AI Agent赋能科研:从工具到合作者,重塑科学研究范式 |
large language model |
|
|
| 21 |
HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks |
HWE-Bench:首个面向真实硬件缺陷修复任务的大规模LLM Agent基准测试 |
large language model |
|
|
| 22 |
Learning to Draw ASCII Improves Spatial Reasoning in Language Models |
通过学习绘制ASCII图提升语言模型空间推理能力 |
large language model |
|
|
| 23 |
El Agente Forjador: Task-Driven Agent Generation for Quantum Simulation |
El Agente Forjador:面向量子模拟的任务驱动型智能体生成框架 |
large language model |
|
|
| 24 |
GDPR Auto-Formalization with AI Agents and Human Verification |
提出基于AI Agent和人工验证的GDPR自动形式化框架,提升法律文本处理质量 |
large language model |
|
|
| 25 |
Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning |
提出一种基于求解器增强的多查询LLM推理方法,解决跨查询逻辑矛盾问题 |
large language model |
|
|