| 1 |
RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild |
提出RW-Post基准与AgentFact框架,实现真实场景下可审计的多模态事实核查 |
multimodal visual grounding |
✅ |
|
| 2 |
AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks |
AgentRx:评估LLM Agent在多模态临床预测任务中的性能表现 |
large language model multimodal |
|
|
| 3 |
Active Testing of Large Language Models via Approximate Neyman Allocation |
提出基于近似Neyman分配的主动测试算法,显著降低大语言模型生成任务的评估成本 |
large language model multimodal |
|
|
| 4 |
LLM4Branch: Large Language Model for Discovering Efficient Branching Policies of Integer Programs |
提出LLM4Branch框架,利用大语言模型自动化发现高效的混合整数线性规划分支策略。 |
large language model |
✅ |
|
| 5 |
CORTEG: Foundation Models Enable Cross-Modality Representation Transfer from Scalp to Intracranial Brain Recordings |
提出CORTEG框架,利用预训练EEG基础模型实现跨模态脑电信号迁移与高效解码 |
foundation model |
|
|
| 6 |
Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery |
提出假设驱动的深度研究方法(HDRI)与INFOMINER系统,实现自动化知识发现 |
large language model |
|
|
| 7 |
GuardAD: Safeguarding Autonomous Driving MLLMs via Markovian Safety Logic |
提出GuardAD以解决自动驾驶系统安全性问题 |
large language model multimodal |
|
|
| 8 |
MaD Physics: Evaluating information seeking under constraints in physical environments |
提出MaD Physics基准,旨在评估智能体在物理环境约束下的信息获取与科学发现能力 |
multimodal |
|
|
| 9 |
Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient? |
提出Pi-Serini搜索代理框架,论证了在强推理LLM加持下词法检索(BM25)在深度研究任务中的有效性。 |
large language model |
✅ |
|
| 10 |
The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning |
揭示长文本推理中的“墨水滴入效应”:量化误导性信息对大模型性能的非线性影响 |
large language model |
|
|
| 11 |
Probing Cross-modal Information Hubs in Audio-Visual LLMs |
揭示音视频大模型中的跨模态信息枢纽,并提出无需训练的幻觉缓解策略 |
large language model |
✅ |
|
| 12 |
Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights |
系统性评估领域适配语言模型在5G结构化威胁建模中的效能与局限 |
large language model |
|
|
| 13 |
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge |
提出RACER路由框架,通过分布鲁棒优化实现LLM-as-a-Judge的成本效益平衡 |
large language model |
|
|
| 14 |
Can You Keep a Secret? Involuntary Information Leakage in Language Model Writing |
揭示语言模型写作中的无意信息泄露现象 |
chain-of-thought |
|
|
| 15 |
The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions |
揭示多智能体推理中的“旁观者效应”:量化协作交互中的认知惰性与主权缺失 |
large language model |
|
|
| 16 |
Re-Triggering Safeguards within LLMs for Jailbreak Detection |
提出一种基于嵌入扰动的重触发机制,通过激活大模型内置安全防御来检测越狱提示词。 |
large language model |
|
|
| 17 |
Budget-Efficient Automatic Algorithm Design via Code Graph |
提出基于代码图(Code Graph)的自动算法设计框架,通过增量式修正实现预算高效的算法搜索。 |
large language model |
|
|
| 18 |
An agentic framework for gravitational-wave counterpart association in the multi-messenger era |
提出GW-Eyes智能体框架,利用大语言模型实现引力波与电磁对应体的自动化关联 |
large language model |
|
|
| 19 |
Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing |
提出Disrupt-and-Rectify Smoothing防御框架,实现大语言模型越狱攻击的理论保证防御 |
large language model |
|
|
| 20 |
LLM Jaggedness Unlocks Scientific Creativity |
提出SciAidanBench基准并揭示LLM的“锯齿状”能力分布,通过模型集成提升科学创造力 |
large language model |
|
|
| 21 |
A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives |
提出一种基于论证挖掘与知识图谱的反射式叙事代理,以提升老年人数字陪伴的叙事质量与可信度。 |
large language model |
|
|
| 22 |
ASIA: an Autonomous System Identification Agent |
提出ASIA自主系统辨识智能体,通过大语言模型实现动力学模型构建的自动化闭环。 |
large language model |
|
|
| 23 |
Beyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Execution |
提出动态分层AgentRunner框架,以实现企业级AI执行的可治理性与韧性 |
large language model |
|
|
| 24 |
Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing |
提出EditRisk-Bench基准,系统评估大模型在恶意知识编辑下的推理安全风险 |
large language model |
|
|