| 1 |
Learning to Trust the Crowd: A Multi-Model Consensus Reasoning Engine for Large Language Models |
提出多模型共识推理引擎,提升大语言模型在实例层面的可靠性。 |
large language model |
|
|
| 2 |
Safe-FedLLM: Delving into the Safety of Federated Large Language Models |
Safe-FedLLM:提出一种基于探针的联邦LLM防御框架,提升对抗恶意客户端攻击的安全性。 |
large language model |
✅ |
|
| 3 |
IFDNS: An Iterative Feedback-Driven Neuro-Symbolic Method for Faithful Logical Reasoning |
提出IFDNS:一种迭代反馈驱动的神经符号方法,用于可信的逻辑推理 |
large language model chain-of-thought |
|
|
| 4 |
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent |
OS-Symphony:用于提升计算机使用Agent鲁棒性和泛化性的整体框架 |
multimodal |
|
|
| 5 |
ARM: Role-Conditioned Neuron Transplantation for Training-Free Generalist LLM Agent Merging |
提出ARM:一种免训练的角色条件神经元移植方法,用于通用LLM Agent融合。 |
large language model |
|
|
| 6 |
SALT-KG: A Benchmark for Semantics-Aware Learning on Enterprise Tables |
SALT-KG:一个用于企业表格语义感知学习的基准数据集 |
foundation model |
|
|
| 7 |
Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents |
提出任务解耦规划(TDP)框架,提升长程Agent任务执行的鲁棒性和效率 |
large language model |
|
|
| 8 |
Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents |
提出时间语义记忆(TSM)框架,解决LLM Agent中记忆的时间不准确和碎片化问题 |
large language model |
|
|
| 9 |
RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking |
提出RLPO:一种残差列表偏好优化方法,用于长文本评论排序。 |
large language model |
|
|
| 10 |
Agentic Diagnostic Reasoning over Telecom and Datacenter Infrastructure |
提出基于LLM Agent的诊断框架,用于电信和数据中心基础设施的故障根因分析。 |
large language model |
|
|
| 11 |
When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent |
提出AgentBait攻击与SUPERVISOR防御,提升Web自动化Agent安全性 |
large language model |
|
|
| 12 |
Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition |
反对LLM确定性推理:提出Stochastic CHAOS以提升模型不确定性建模与安全性 |
large language model |
|
|
| 13 |
From "Thinking" to "Justifying": Aligning High-Stakes Explainability with Professional Communication Standards |
提出结构化解释框架以提升高风险领域的可解释性 |
chain-of-thought |
|
|
| 14 |
DiSCo: Making Absence Visible in Intelligent Summarization Interfaces |
DiSCo通过对比领域知识,使智能摘要界面中信息的缺失变得可见。 |
large language model |
|
|
| 15 |
LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing |
LLMRouterBench:大规模LLM路由基准测试与统一框架 |
large language model |
✅ |
|
| 16 |
Active Context Compression: Autonomous Memory Management in LLM Agents |
Focus:面向LLM Agent的主动上下文压缩,解决长程任务中的Context Bloat问题 |
large language model |
|
|
| 17 |
Defenses Against Prompt Attacks Learn Surface Heuristics |
提出对抗提示攻击的新防御方法以解决现有模型的安全性问题 |
large language model |
|
|
| 18 |
A Large-Scale Study on the Development and Issues of Multi-Agent AI Systems |
大规模分析多智能体AI系统演进与问题,揭示开发挑战与维护需求。 |
large language model |
|
|