| 1 |
To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model |
MMGuard:通过对抗性扰动保护多模态数据免受未经授权的LVLM微调 |
multimodal |
|
|
| 2 |
SemaTune: Semantic-Aware Online OS Tuning with Large Language Models |
SemaTune:基于大语言模型的语义感知在线操作系统调优框架 |
large language model |
|
|
| 3 |
KGPFN: Unlocking the Potential of Knowledge Graph Foundation Model via In-Context Learning |
提出KGPFN,通过上下文学习释放知识图谱基础模型的潜力 |
foundation model |
✅ |
|
| 4 |
MediaClaw: Multimodal Intelligent-Agent Platform Technical Report |
MediaClaw:多模态智能体平台,解决AIGC部署中的碎片化和流程断连问题 |
multimodal |
|
|
| 5 |
APWA: A Distributed Architecture for Parallelizable Agentic Workflows |
APWA:一种用于并行化Agent工作流的分布式架构 |
large language model |
|
|
| 6 |
Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling |
提出双维度一致性(DDC)框架,平衡LLM推理加速中的预算与质量。 |
large language model |
|
|
| 7 |
SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning |
提出SpeakerLLM:一个面向说话人理解和验证推理的说话人专用音频LLM |
large language model |
|
|
| 8 |
Small, Private Language Models as Teammates for Educational Assessment Design |
利用小型私有语言模型作为队友,辅助教育评估设计 |
large language model |
|
|
| 9 |
A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions |
提出确定性Agent工作流,解决HS编码多维度规则推理难题,实现可解释的关税分类。 |
large language model |
|
|
| 10 |
AI Outperforms Humans in Personalized Image Aesthetics Assessment via LLM-Based Interviews and Semantic Feature Extraction |
提出基于LLM访谈和语义特征提取的AI个性化图像美学评估系统,超越人类表现 |
large language model |
|
|
| 11 |
SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades |
SWE-Chain:用于评估代码智能体在链式发布级软件包升级任务上的性能基准 |
large language model |
|
|
| 12 |
Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems |
提出HEAR:基于分层超图的企业智能Agent,解决复杂业务系统中多跳推理难题 |
large language model |
|
|
| 13 |
TeachAnything: A Multimodal Crowdsourcing Platform for Training Embodied AI Agents in Symmetrical Reality |
提出TeachAnything平台,用于在对称现实中训练具身智能体 |
embodied AI multimodal |
|
|
| 14 |
Zero-Shot Goal Recognition with Large Language Models |
利用大型语言模型实现零样本目标识别,探索其规划知识 |
large language model |
|
|
| 15 |
Teaching Large Language Models When Not to Know: Learning Temporal Critique for Ex-Ante Reasoning |
提出TCFT框架,提升大语言模型在时序推理中对时间截断点的感知能力 |
large language model |
|
|
| 16 |
SliceGraph: Mapping Process Isomers in Multi-Run Chain-of-Thought Reasoning |
提出SliceGraph以分析多轮CoT推理中过程同分异构体,揭示中间计算共享、分裂和重组的模式。 |
chain-of-thought |
|
|
| 17 |
Complacent, Not Sycophantic: Reframing Large Language Models and Designing AI Literacy for Complacent Machines |
重新定义大语言模型:从谄媚到顺从,并为顺从型机器设计AI素养教育 |
large language model |
|
|
| 18 |
Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification |
提出基于竞技场论证计算的多智能体辩论框架,用于多媒体验证 |
large language model multimodal |
✅ |
|
| 19 |
OmniDrop: Layer-wise Token Pruning for Omni-modal LLMs via Query-Guidance |
OmniDrop:提出一种基于查询引导的层级Token剪枝方法,用于优化Omni-modal LLM。 |
large language model multimodal |
|
|
| 20 |
Stateful Reasoning via Insight Replay |
提出InsightReplay,解决长链CoT推理中关键信息遗忘问题 |
large language model chain-of-thought |
|
|
| 21 |
DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping |
DVMap:通过高共识人口统计-价值映射实现细粒度多元价值对齐 |
large language model chain-of-thought |
✅ |
|
| 22 |
One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries |
揭示恶意微调防御的脆弱性:提出自适应攻击破解现有防御机制 |
foundation model |
|
|
| 23 |
Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces |
揭示过完备推理轨迹中最小核心的表征几何特性,实现推理过程压缩与提纯。 |
chain-of-thought |
|
|
| 24 |
Runtime-Structured Task Decomposition for Agentic Coding Systems |
提出运行时结构化任务分解,提升Agentic编码系统效率与可靠性 |
large language model |
|
|
| 25 |
Amortized Energy-Based Bayesian Inference |
提出基于能量的摊销贝叶斯推断方法,加速非线性反问题求解。 |
multimodal |
|
|
| 26 |
Hidden in Memory: Sleeper Memory Poisoning in LLM Agents |
提出“沉睡记忆中毒”攻击,揭示LLM Agent长期记忆的安全风险 |
large language model |
|
|
| 27 |
$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows |
提出$π$-Bench,用于评估个人助理Agent在长程工作流中的主动性。 |
large language model |
|
|
| 28 |
Agentic Design of Compositional Descriptors via Autoresearch for Materials Science Applications |
提出Automat框架,利用自主研究设计化学描述符,提升材料性质预测精度。 |
large language model |
|
|
| 29 |
Prompt Segmentation and Annotation Optimisation: Controlling LLM Behaviour via Optimised Segment-Level Annotations |
提出PSAO框架,通过优化分段Prompt标注提升LLM控制力和效率 |
large language model |
|
|
| 30 |
Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining |
提出Cattle Trade多智能体基准,用于评估LLM在策略推理、博弈和议价中的能力。 |
large language model |
|
|
| 31 |
Deepchecks: Evaluating Retrieval-Augmented Generation (RAG) |
Deepchecks:用于评估检索增强生成(RAG)系统的全面框架 |
large language model |
|
|
| 32 |
BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE |
提出BEAM:一种二元专家激活掩码方法,用于MoE中的动态路由,提升推理效率。 |
large language model |
|
|
| 33 |
Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints |
提出一种面向大语言模型代码工具的、在有效上下文窗口约束下的正确性感知仓库过滤框架。 |
large language model |
|
|
| 34 |
Hydra: Efficient, Correct Code Generation via Checkpoint-and-Rollback Support |
Hydra:通过检查点与回滚支持实现高效、正确的代码生成 |
large language model |
|
|
| 35 |
Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games |
提出博弈策略水印方法,用于检测完美信息扩展式博弈中AI作弊行为 |
large language model |
|
|
| 36 |
Agentic AI Ecosystems in Higher Education: A Perspective on AI Agents to Emerging Inclusive, Agentic Multi-Agent AI Framework for Learning, Teaching and Institutional Intelligence |
提出面向高等教育的Agentic多智能体AI框架,以支持包容性学习和机构智能 |
multimodal |
|
|
| 37 |
Good to Go: The LOOP Skill Engine That Hits 99% Success and Slashes Token Usage by 99% via One-Shot Recording and Deterministic Replay |
LOOP Skill Engine:通过一次记录和确定性回放,实现99%成功率并降低99% Token消耗 |
large language model |
|
|