cs.AI(2026-05-28)

📊 共 79 篇论文 | 🔗 15 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (55 🔗8) 支柱二:RL算法与架构 (RL & Architecture) (21 🔗7) 支柱四:生成式动作 (Generative Motion) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (55 篇)

#题目一句话要点标签🔗
1 Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage 揭示LLM按Token计费模式的欺诈风险:供应商可恶意虚报Token数量 large language model
2 MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization 提出MuPHIRM框架,通过语义对齐的奖励优化提升VLM在隐式多模态危害推理上的能力 multimodal
3 COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings COMET:通过概念空间剖析音频-文本多模态对比嵌入中的模态差异 multimodal
4 Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence 提出HetMedAgent异构多智能体框架,融合通用LLM与专科模型,提升医疗决策性能。 large language model foundation model
5 SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations SchGen:基于语义代码表示的PCB原理图生成模型 large language model
6 Demystifying Data Organization for Enhanced LLM Training 探索数据组织策略,提升大语言模型训练效率与稳定性 large language model
7 Teaching Values to Machines: Simulating Human-Like Behavior in LLMs 通过价值观引导,使LLM模拟更具人类一致性的行为 large language model
8 PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing 提出PRAIB基准,评估LLM辅助评审行为,揭示其与人类评审的差异 large language model
9 Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems 针对Agentic AI系统,评估Token优化格式TOON和TRON在降低Token开销方面的性能。 large language model
10 VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing VLA-Trace:通过表征和行为追踪诊断视觉-语言-动作模型 vision-language-action VLA multimodal
11 Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability 提出基于偏好最大可满足性的方法以解决LLM优化问题 large language model chain-of-thought
12 CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models 提出CodeGolf Bench,用于评估大语言模型在60种编程语言中生成简洁代码的能力。 large language model
13 OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields 提出OmniMatBench以解决材料科学多模态推理不足问题 multimodal
14 Social Reasoning in Machines: Investigating Collective Truth-Seeking Dynamics in Large Language Model Debate 通过LLM多智能体辩论模拟论证推理理论,提升问答任务的真值探寻性能。 large language model
15 Harnessing non-adversarial robustness in large language models 提出一种基于去偏置微调的LLM非对抗鲁棒性提升方法 large language model
16 Benchmarking Positional Encoding Strategies for Transformer-Based EEG Foundation Models 针对脑电Transformer基础模型,对比评估多种位置编码策略在脑机接口任务中的性能。 foundation model
17 Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models 提出基于LLM的多智能体框架,提升儿童协同故事创作质量 large language model
18 HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering HiKEY提出层级多模态检索框架,解决开放域文档问答中的路由失败和证据碎片化问题。 multimodal
19 FinVerBench: Benchmark Validity and Calibration in Large Language Model Financial Statement Verification FinVerBench:构建金融报表验证基准,评估大语言模型在财务一致性判断中的有效性和校准性 large language model
20 DenseSteer: Steering Small Language Models towards Dense Math Reasoning DenseSteer:引导小语言模型实现高密度数学推理 large language model chain-of-thought
21 Inferring Code Correctness from Specification TRAILS:基于输入输出对齐规范推断代码正确性,提升LLM代码验证精度。 large language model chain-of-thought
22 When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs 研究表明角色提示主要重塑LLM响应特征而非提升能力,需多指标评估。 large language model
23 When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop 研究多模型自消费循环中人工干预的负面影响及偏好对齐问题 foundation model
24 Automatically Attacking Software Reverse Engineering AI Agents 提出基于遗传算法的提示生成方法,攻击软件逆向工程AI Agent。 large language model
25 Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents 揭示自进化LLM Agent中Harness更新与收益的解耦关系,优化Agent训练策略。 instruction following
26 Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection 提出VisAnomReasoner,一种高效的视觉-语言推理模型,用于时间序列异常检测。 multimodal
27 ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure ProjectionBench:提出一种渐进式信息披露的LLM科学假设生成评估框架 large language model
28 Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization 提出时序与结构信用分配方法,优化LLM多智能体提示,提升复杂推理任务性能。 large language model
29 Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale 设计并评估三方LLM-教师协作系统,用于大规模K-12写作教学 large language model
30 When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems 探索混合多智能体系统:云端与设备端智能协同推理的设计空间 large language model
31 PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers 提出PokerSkill框架以解决无训练扑克游戏问题 large language model
32 Projectional Decoding: Towards Semantic-Aware LLM Generation 提出投影解码,通过集成领域语义提升LLM生成软件工件的语义有效性。 large language model
33 RAISE: RAG Design as an Architecture Search Problem 提出RAISE框架,将RAG设计转化为架构搜索问题,实现RAG超参数优化。 multimodal
34 From GPS Points to Travel Patterns: Flexible and Semantic Trajectory Generation with LLMs HTP:利用LLM分层生成城市轨迹,解决隐私限制下轨迹数据不足问题 large language model
35 Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent Compass:通过专家指导的LLM Agent导航全球海洋铅数据集成 large language model
36 Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction 提出MemPoison,通过对话交互隐蔽劫持LLM Agent记忆,实现特洛伊木马攻击。 large language model
37 HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding 提出HoliTok:一种稳健的连续整体语音Token化模型,用于语音生成和理解 foundation model
38 Make LLM Learn to Synthesize from Streaming Experiences through Feedback 提出StreamSynth和SynLearner,使LLM在流式合成任务中持续学习并迁移经验。 large language model
39 Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation 提出Moment-KV,一种基于动量的解码时KV缓存压缩方法,用于提升长文本生成质量。 large language model
40 Mitigating Stethoscope-Induced Shortcuts in Respiratory Sound Classification under Federated Domain Generalization with Causality-Inspired Interventions 提出因果干预的联邦域泛化方法,解决呼吸音分类中听诊器伪相关问题。 multimodal
41 LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs LFQ:Logit感知的最终块量化,提升低比特量化LLM的生成质量 large language model
42 NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs 提出噪声感知LoRA(NaRA),用于高效微调扩散语言模型。 large language model
43 BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices BitTP:面向边缘设备的轻量化轨迹预测模型,利用BitLLM实现高效推理 large language model
44 Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation 提出Think Fast, Talk Smart框架,用于从结构化健康数据中生成高质量健康文本。 large language model
45 LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning 利用LLM进化领域无关启发式算法,超越符号AI规划人工设计水平 large language model
46 ParaTool: Shifting Tool Representations from Context to Parameters ParaTool:将工具表示从上下文转移到参数,提升大模型工具调用能力 large language model
47 Battery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter Estimation 提出Battery-Sim-Agent以解决电池参数估计问题 large language model
48 Opt-Verifier: Unleashing the Power of LLMs for Optimization Modeling via Dual-Side Verification Opt-Verifier:利用双侧验证释放LLM在优化建模中的潜力 large language model
49 MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs MINDGAMES:用于评估多智能体LLM中社会和战略推理的实时竞技场 large language model
50 Xetrieval: Mechanistically Explaining Dense Retrieval Xetrieval:提出一种可解释的稠密检索框架,揭示embedding层面的推理机制。 chain-of-thought
51 SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing SciIntBench:提出对抗性基准测试,评估LLM在科研诚信规范下的合规性 large language model
52 CrystalXRD-Bench: Benchmarking Vision-Language Models for XRD Peak Indexing Across Diverse Crystalline Materials CrystalXRD-Bench:用于评估视觉-语言模型在晶体材料XRD峰索引任务上的性能 multimodal
53 Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet 利用稀疏自编码器从Claude 3 Sonnet中提取可解释的单义特征 multimodal
54 Provably Secure Agent Guardrail 提出基于逻辑推理约束的Agent Guardrail,解决AI失控安全问题 large language model
55 ReasonOps: Operator Segmentation for LLM Reasoning Traces ReasonOps:提出一种无监督的LLM推理轨迹算子分割方法,用于分析和理解LLM的推理过程。 chain-of-thought

🔬 支柱二:RL算法与架构 (RL & Architecture) (21 篇)

#题目一句话要点标签🔗
56 When Should Models Change Their Minds? Contextual Belief Management in Large Language Models 提出上下文信念管理框架,解决LLM在长程交互中信念状态维护问题 reinforcement learning large language model
57 LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback LLUMI:利用在线社区反馈提升LLM在心理健康支持方面的写作辅助能力 DPO direct preference optimization large language model
58 SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search 提出SAAS,通过自感知强化学习缓解Agentic Search中的过度搜索问题 reinforcement learning
59 ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control ReasonLight:基于多模态大模型的零样本交通信号控制强化学习框架 reinforcement learning foundation model multimodal
60 Physically Viable World Models: A Case for Query-Conditioned Embodied AI 提出查询条件下的具身智能世界模型,解决物理可行性问题 world model world models embodied AI
61 PassNet: Scaling Large Language Models for Graph Compiler Pass Generation PassNet:扩展大型语言模型以生成图编译器Pass,提升长尾工作负载性能。 world model world models large language model
62 MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models MiraBench:评估机器人世界模型中动作条件下的可靠性 world model world models physically plausible
63 KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning 提出KairosAgent,融合语义推理的Agentic时间序列预测框架 reinforcement learning large language model foundation model
64 Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models 提出基于熵-KL散度的token掩码方法EKSFT,用于大语言模型选择性微调。 reinforcement learning large language model
65 Toward AI Systems That Understand Self and Others: A Multi-Phase Inference Framework for Human Cognitive Diversity and World-Model Alignment 提出多阶段推理框架MIM,旨在使AI理解人类认知多样性与世界模型对齐 world model world models
66 OptSkills: Learning Generalizable Optimization Skills from Problem Archetypes via Cluster-Based Distillation OptSkills:通过聚类蒸馏从问题原型中学习通用优化技能 distillation large language model
67 Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning 提出DOMINO框架以解决领域特定数据合成问题 representation learning large language model
68 TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation TRACE:基于Toulmin论证的LLM推理评估框架,提升CoT评估质量 reinforcement learning large language model chain-of-thought
69 The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF 提出DistractionIF基准,揭示LLM在含干扰指令文本中指令遵循鲁棒性的逆向缩放现象 reinforcement learning large language model instruction following
70 Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility 提出数据-模型兼容性指标,用于指导学生模型推理蒸馏的数据选择。 distillation large language model
71 Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving 提出不确定性感知和时序约束的专家指导强化学习,用于自动驾驶安全探索 reinforcement learning policy learning
72 Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling RACE-Sched:异步Agent框架,协调实时约束与长程推理,解决动态调度问题 reinforcement learning deep reinforcement learning large language model
73 SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search SAAS:面向Agentic搜索中过度搜索缓解的自感知强化学习 reinforcement learning
74 Learning to Choose: An Empowerment-Guided Multi-Agent System with semantic communication for Adaptive Method Selection 提出一种基于赋权的语义通信多智能体系统,用于自适应方法选择。 policy learning large language model
75 DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning DeepTool:通过过程监督强化学习扩展工具集成推理中的交错思考 reinforcement learning
76 DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework DELOS:利用对比学习框架检测开普勒光变曲线中的浅凌星现象 contrastive learning

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
77 How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency 大规模实验评估LLM渗透测试一致性,揭示不同模型攻击行为的可靠性差异 penetration large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
78 PhyDrawGen: Physically Grounded Diagram Generation from Natural Language PhyDrawGen:提出一种神经符号方法,用于从自然语言生成符合物理规律的图示 scene understanding large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
79 Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization 提出基于零阶优化的混合框架,提升LLM安全对齐的鲁棒性 manipulation large language model

⬅️ 返回 cs.AI 首页 · 🏠 返回主页