| 1 |
Predicting Power-System Dynamic Trajectories with Foundation Models |
提出LASS-ODE-Power,利用大规模预训练预测电力系统动态轨迹。 |
foundation model |
|
|
| 2 |
Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models |
单细胞Foundation模型中,中间层编码了最优的生物学表征,超越了传统末层特征提取方法。 |
foundation model |
|
|
| 3 |
Dissecting Failure Dynamics in Large Language Model Reasoning |
提出GUARD框架,通过不确定性信号探测并纠正大语言模型推理过程中的早期错误。 |
large language model |
|
|
| 4 |
Rethinking Patient Education as Multi-turn Multi-modal Interaction |
提出MedImageEdu基准,用于评估多模态交互式患者教育智能体 |
multimodal visual grounding |
|
|
| 5 |
MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror |
MirrorBench:通过引入镜像评估多模态大语言模型中的自我中心智能 |
large language model multimodal |
✅ |
|
| 6 |
DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation |
提出DR³-Eval:一个用于评估深度研究Agent的现实且可复现的基准 |
multimodal instruction following |
|
|
| 7 |
Context Over Content: Exposing Evaluation Faking in Automated Judges |
揭示LLM评估中的情境偏见:下游影响信息会扭曲评估结果 |
chain-of-thought |
|
|
| 8 |
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime |
AIPC:基于Agent的AI模型自动化部署框架,加速高通AI Runtime部署。 |
multimodal |
|
|
| 9 |
VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs |
提出VeriGraphi框架,解决LLM生成大型分层硬件设计Verilog代码的难题 |
large language model |
|
|
| 10 |
Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines |
Scepsy:利用聚合LLM流水线服务Agentic工作流,提升吞吐量并降低延迟 |
large language model |
|
|
| 11 |
Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC |
提出自我进化的逻辑综合框架以提升EDA工具性能 |
large language model |
|
|
| 12 |
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench |
提出ProVoice-Bench,用于评估主动式语音代理,填补现有基准测试的空白。 |
multimodal |
|
|
| 13 |
Dr.~RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement |
Dr.~RTL:通过工具驱动的自主Agent持续优化RTL设计 |
large language model |
|
|
| 14 |
Discovering Novel LLM Experts via Task-Capability Coevolution |
提出AC/DC框架,通过任务-能力协同进化发现具备新技能且更高效的LLM。 |
large language model |
|
|
| 15 |
Governing Reflective Human-AI Collaboration: A Framework for Epistemic Scaffolding and Traceable Reasoning |
提出一种人机协作框架,通过知识支架和可追溯推理提升AI治理能力 |
large language model |
|
|
| 16 |
MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration |
MemoSight:融合上下文压缩与多Token预测,加速LLM推理 |
chain-of-thought |
|
|
| 17 |
The Missing Knowledge Layer in AI: A Framework for Stable Human-AI Reasoning |
提出稳定人机推理框架,解决大语言模型推理漂移问题 |
large language model |
|
|
| 18 |
The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows |
揭示LLM谬误:AI辅助认知工作流中的能力误判现象 |
large language model |
|
|
| 19 |
Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution |
提出面向企业AI的Bounded Autonomy架构,保障LLM安全执行并提升效率。 |
large language model |
|
|
| 20 |
The Agentification of Scientific Research: A Physicist's Perspective |
AI Agent赋能科研:从工具到合作者,重塑科学研究范式 |
large language model |
|
|
| 21 |
HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks |
HWE-Bench:首个面向真实硬件缺陷修复任务的大规模LLM Agent基准测试 |
large language model |
|
|
| 22 |
Learning to Draw ASCII Improves Spatial Reasoning in Language Models |
通过学习绘制ASCII图提升语言模型空间推理能力 |
large language model |
|
|
| 23 |
El Agente Forjador: Task-Driven Agent Generation for Quantum Simulation |
El Agente Forjador:面向量子模拟的任务驱动型智能体生成框架 |
large language model |
|
|
| 24 |
GDPR Auto-Formalization with AI Agents and Human Verification |
提出基于AI Agent和人工验证的GDPR自动形式化框架,提升法律文本处理质量 |
large language model |
|
|
| 25 |
Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning |
提出一种基于求解器增强的多查询LLM推理方法,解决跨查询逻辑矛盾问题 |
large language model |
|
|
| 26 |
Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks |
提出基于思维链(CoT)提示的大语言模型代码去混淆方法,提升控制流恢复和语义保持。 |
large language model chain-of-thought |
|
|
| 27 |
LLMbench: A Comparative Close Reading Workbench for Large Language Models |
LLMbench:用于大语言模型比较性细读的浏览器工作台 |
large language model |
|
|
| 28 |
Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU |
提出Ragged Paged Attention,为TPU上的LLM推理提供高性能和灵活的内核。 |
large language model |
|
|
| 29 |
LACE: Lattice Attention for Cross-thread Exploration |
提出LACE框架以解决大语言模型推理孤立问题 |
large language model |
|
|
| 30 |
The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE |
提出半可执行栈模型,应对AI驱动下软件工程范畴扩展至半可执行工件的挑战。 |
foundation model |
|
|
| 31 |
The Crutch or the Ceiling? How Different Generations of LLMs Shape EFL Student Writings |
研究不同代LLM对EFL学生写作的影响:支柱还是天花板? |
large language model |
|
|
| 32 |
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents? |
提出HarmfulSkillBench,评估LLM智能体在恶意技能环境下的安全性 |
large language model |
✅ |
|
| 33 |
Exploring LLM-based Verilog Code Generation with Data-Efficient Fine-Tuning and Testbench Automation |
提出基于LLM的Verilog代码生成方法,通过数据高效微调和自动化测试平台提升性能。 |
large language model |
|
|