| 1 |
XSkill: Continual Learning from Experience and Skills in Multimodal Agents |
XSkill:提出一种基于经验和技能的持续学习框架,提升多模态Agent的工具使用效率。 |
multimodal |
|
|
| 2 |
TopoBench: Benchmarking LLMs on Hard Topological Reasoning |
TopoBench:用于评估LLM在复杂拓扑推理能力上的基准测试 |
large language model chain-of-thought |
|
|
| 3 |
Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework |
提出SSGM框架,用于解决LLM Agent动态记忆中的风险、语义漂移和隐私问题 |
large language model multimodal |
|
|
| 4 |
Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks |
提出显式逻辑通道以验证和增强MLLM在零样本任务中的表现 |
large language model multimodal |
|
|
| 5 |
LLMs can construct powerful representations and streamline sample-efficient supervised learning |
利用LLM构建强大表征,简化样本高效的监督学习 |
foundation model multimodal |
|
|
| 6 |
Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting |
ProtoSR:利用原型知识指导细粒度结构化放射报告生成,提升图像理解能力。 |
multimodal |
|
|
| 7 |
Human-Centred LLM Privacy Audits: Findings and Frictions |
提出LMP2工具,研究LLM对个人信息的关联及用户隐私感知,揭示生成式AI评估困境。 |
large language model |
|
|
| 8 |
Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems |
Cascade:构建软硬件攻击工具,放大复合AI系统中的对抗威胁 |
large language model |
|
|
| 9 |
Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-agent AI |
提出NormCoRe框架,用于在多智能体AI中研究规范,通过翻译人类实验设计实现。 |
foundation model |
|
|
| 10 |
Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks |
揭示大语言模型在无害任务中处理用户恶意内容时的行为 |
large language model |
|
|
| 11 |
AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization |
AdaFuse:通过Token级预选通和融合Kernel优化加速动态Adapter推理 |
large language model |
|
|
| 12 |
You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents |
揭示LLM Agent中指令文本诱导的私有数据泄露风险,提出ReadSecBench基准 |
instruction following |
|
|
| 13 |
Automating Skill Acquisition through Large-Scale Mining of Open-Source Agentic Repositories: A Framework for Multi-Agent Procedural Knowledge Extraction |
提出一种框架,通过大规模挖掘开源智能体仓库自动获取技能,用于多智能体程序知识提取。 |
large language model |
|
|
| 14 |
Gender Bias in Generative AI-assisted Recruitment Processes |
揭示GenAI招聘中性别偏见:GPT-5对意大利毕业生职业建议的性别化语言模式分析 |
large language model |
|
|
| 15 |
Scaling Laws for Educational AI Agents |
提出Agent Scaling Law,通过结构化AgentProfile提升教育AI Agent能力 |
large language model |
|
|
| 16 |
From Control to Foresight: Simulation as a New Paradigm for Human-Agent Collaboration |
提出Simulation-in-the-loop,提升人机协作中Agent的决策预见性 |
large language model |
|
|
| 17 |
Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats |
针对OpenClaw等自主LLM Agent,提出生命周期安全框架以分析和缓解潜在威胁 |
large language model |
|
|
| 18 |
See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay |
利用空间表征增强VLMs在交互式游戏中的表现 |
symbolic grounding |
|
|
| 19 |
Multi-Agent Collaboration for Automated Design Exploration on High Performance Computing Systems |
提出MADA:基于LLM的多智能体框架,用于高性能计算系统上的自动化设计探索。 |
large language model |
|
|
| 20 |
Speak or Stay Silent: Context-Aware Turn-Taking in Multi-Party Dialogue |
提出上下文感知轮流转换方法,提升多方对话中语音助手表现 |
large language model |
|
|
| 21 |
Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment |
提出基于拒绝触发词解激活的安全对齐方法,缓解大语言模型过度拒绝问题 |
large language model |
|
|
| 22 |
XSkill: Continual Learning from Experience and Skills in Multimodal Agents |
XSkill:多模态Agent中基于经验和技能的持续学习框架 |
multimodal |
|
|