cs.AI(2026-05-07)

📊 共 94 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (64 🔗7) 支柱二:RL算法与架构 (RL & Architecture) (27 🔗1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (64 篇)

#题目一句话要点标签🔗
1 Data Language Models: A New Foundation Model Class for Tabular Data 提出数据语言模型(DLM),为表格数据提供原生理解能力,无需预处理。 large language model foundation model
2 Multimodal Deep Generative Model for Semi-Supervised Learning under Class Imbalance 提出一种多模态深度生成模型,解决类别不平衡下的半监督学习问题。 multimodal
3 Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models 提出基于Graphlet结构词汇的知识图谱基础模型,提升零样本迁移能力 foundation model
4 NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research NeuroAgent:基于LLM的多模态神经影像分析智能体框架 multimodal
5 Debiased Multimodal Personality Understanding through Dual Causal Intervention 提出双重因果干预网络DCAN,解决多模态人格理解中的偏差问题。 multimodal
6 Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models 对比分析真实与合成表格数据先验分布差异,评估其对表格预训练模型性能的影响 foundation model
7 CoupleEvo: Evolving Heuristics for Coupled Optimization Problems Using Large Language Models CoupleEvo:利用大语言模型进化耦合优化问题的启发式算法 large language model
8 GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation GlazyBench:用于陶瓷釉料属性预测与图像生成的基准数据集 large language model multimodal
9 Super-Level-Set Regression: Conditional Quantiles via Volume Minimization 提出超水平集回归(SLS),通过最小化体积直接学习条件分位数,解决多元回归问题。 multimodal
10 Rethinking Adapter Placement: A Dominant Adaptation Module Perspective 提出DomLoRA,通过单适配器放置实现参数高效的微调,优于传统LoRA。 instruction following
11 MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems MASPO:面向LLM多智能体系统的联合提示优化框架 large language model
12 Process Matters more than Output for Distinguishing Humans from Machines 提出CogCAPTCHA30认知任务集,通过过程特征而非输出区分人类与机器。 large language model
13 PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors PrefixGuard:从LLM-Agent轨迹到在线故障预警监控器 large language model
14 Constraint Decay: The Fragility of LLM Agents in Backend Code Generation 揭示LLM Agent在后端代码生成中结构约束下的脆弱性,发现“约束衰减”现象 large language model
15 SCRuB: Social Concept Reasoning under Rubric-Based Evaluation 提出SCRuB框架,用于评估大语言模型在社会概念推理方面的能力。 large language model
16 Knowledge Graphs, the Missing Link in Agentic AI-based Formal Verification 提出基于知识图谱的Agentic AI形式验证方法,提升SystemVerilog断言生成质量。 large language model
17 From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work 提出执行谱系,通过确定性图解决AI原生工作流的可复现性问题 large language model
18 Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Systems Perspective 提出人机协同进化动态系统模型,揭示AI依赖可能导致认知退化风险 large language model
19 Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis 微调小型语言模型,解决面向解决方案的Windows事件日志分析难题 large language model
20 Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs LATTE框架通过自适应任务图提升语言代理团队的效率,降低资源消耗。 large language model
21 Measuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalization 提出基于推理轨迹几何、覆盖度和文本置信度的黑盒置信度评估方法 chain-of-thought
22 Addressing Labelled Data Scarcity: Taxonomy-Agnostic Annotation of PII Values in HTTP Traffic using LLMs 提出基于LLM的HTTP流量PII值分类标注方法,解决标注数据稀缺和分类体系固定的问题。 large language model
23 Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions 大规模研究揭示LLM生成代码中库版本选择的安全漏洞与兼容性风险 large language model
24 OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning 提出多模态大语言模型OmicsLM,实现转录组定量数据与自然语言生物学推理的深度融合。 large language model multimodal instruction following
25 ICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Models 提出ICU-Bench基准以评估多模态大模型在持续学习场景下的隐私遗忘能力 large language model multimodal
26 Causal Probing for Internal Visual Representations in Multimodal Large Language Models 提出基于因果干预的探测框架,揭示多模态大模型内部视觉表征的编码机制与缩放规律 large language model multimodal
27 AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification 提出AstroAlertBench基准测试,评估多模态大模型在天文瞬变事件分类中的准确性、推理能力与诚实度。 large language model multimodal
28 An Interpretable and Scalable Framework for Evaluating Large Language Models 提出基于Majorization-Minimization的IRT评估框架,实现大模型能力评估的可解释性与高效扩展 large language model
29 Quantum-enhanced Large Language Models on Quantum Hardware via Cayley Unitary Adapters 提出基于Cayley酉矩阵适配器的量子增强大语言模型,在真实量子硬件上实现性能提升 large language model
30 Saliency-Aware Regularized Quantization Calibration for Large Language Models 提出显著性感知正则化量化校准(SARQC),提升大语言模型量化后性能。 large language model
31 Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction 提出两阶段提示框架,系统评估大语言模型在出院临床行动提取任务中的表现 large language model
32 LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution 提出LCC-LLM框架与LCCD数据集,通过代码中心化检索增强与多任务推理实现精准恶意软件归因 large language model
33 DataDignity: Training Data Attribution for Large Language Models 提出DataDignity框架与FakeWiki基准,通过监督对比学习实现大语言模型训练数据溯源 large language model
34 Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning 通过从LLM推理轨迹中提取搜索树,揭示其规划过程中的近视性特征 large language model chain-of-thought
35 Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models 提出动态边界评估(DBE)框架,通过自适应搜索解决大模型静态基准测试的饱和与偏差问题。 large language model instruction following
36 CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs 提出CrossCult-KIBench基准与MCKI方法,以解决多模态大模型跨文化知识注入与对齐难题 large language model multimodal
37 LLM-Driven Design Space Exploration of FPGA-based Accelerators 提出SECDA-DSE框架,利用大语言模型驱动FPGA加速器的自动化设计空间探索 large language model chain-of-thought
38 Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning 提出基于零空间约束的对比视觉遗忘方法,实现多模态大模型的高效知识移除 large language model multimodal
39 LeakDojo: Decoding the Leakage Threats of RAG Systems 提出LeakDojo评估框架,系统性揭示检索增强生成(RAG)系统的知识泄露风险 large language model instruction following
40 Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs 提出基于重构-隐蔽权衡的MLLM越狱攻击框架,通过字符移除与关键词干扰提升攻击成功率 large language model multimodal
41 SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety 提出SafeHarbor框架:通过分层记忆增强与自演化机制,解决LLM智能体安全防御中的过度拒绝问题。 large language model foundation model
42 CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency 提出CITE算法,实现大模型自洽性采样中任意时刻有效的统计推断与错误控制 large language model
43 Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems 提出基于集成卡尔曼反演的主动学习框架,优化大模型多智能体系统的通信结构 large language model
44 From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle 提出基于检索增强生成(RAG)的Moodle AI教学助手,实现教育内容的精准溯源与苏格拉底式交互。 large language model
45 How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem 通过等价类问题(ECP)实证评估大语言模型在长链推理任务中的性能表现 large language model
46 LLM-Guided Open Hypothesis Learning from Autonomous Scanning Probe Microscopy Experiments 提出基于大模型引导的开放式假设学习框架,实现扫描探针显微镜的自主科学发现 large language model
47 When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning 提出SCALAR框架:通过结构化批评-行动循环提升AI在理论物理研究中的推理能力 large language model
48 A Self-Healing Framework for Reliable LLM-Based Autonomous Agents 提出一种面向LLM自主智能体的自愈框架,通过故障检测与动态重规划提升系统鲁棒性。 large language model
49 Towards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metric 提出视觉-语言逻辑一致性度量(VL-LCM),实现无需标注的MLLM评估 large language model
50 The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models 揭示大模型社会角色表征的“粒度轴”:一种微观到宏观的潜在因果方向 large language model
51 Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios 提出Event-Causal RAG框架,通过事件因果图与双重存储机制实现超长视频的因果推理。 foundation model
52 Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost 提出Post-Reasoning方法,通过后置推理机制提升非思维链模型性能且零推理成本 large language model
53 Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs 提出基于知识优先的自动启发式设计框架,通过LLM实现组合优化中代码与知识的深度融合。 large language model
54 Visual Fingerprints for LLM Generation Comparison 提出基于视觉指纹的方法,用于比较不同生成条件下LLM的输出倾向。 large language model
55 Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks 提出安全瓶颈正则化(SBR)方法,通过几何锚点防御大模型的有害微调攻击 large language model
56 MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System 提出MAS-Algorithm多智能体工作流,通过模块化协作提升AI算法编程问题的求解能力 chain-of-thought
57 Taklif.AI: LLM-Powered Platform for Interest-Based Personalized College Assignments 提出Taklif.AI平台,利用大语言模型实现基于学生兴趣与文化背景的个性化作业生成 large language model
58 CircuitFormer: A Circuit Language Model for Analog Topology Design from Natural Language Prompt 提出CircuitFormer与电路专用分词器CKT,实现基于自然语言的模拟电路拓扑自动设计 large language model
59 ReFlect: An Effective Harness System for Complex Long-Horizon LLM Reasoning 提出ReFlect推理框架:通过确定性封装实现长程任务的错误检测与自动恢复 chain-of-thought
60 An Empirical Study of Proactive Coding Assistants in Real-World Software Development 揭示主动式编程助手仿真与现实的鸿沟:提出ProCodeBench基准与真实行为数据集 large language model
61 Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering 提出自适应多原则引导(AMPS)框架,解决大型推理模型(LRM)推理链中的安全隐患问题。 chain-of-thought
62 Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG 提出TGS-RAG框架,通过文本与知识图谱的双向协同机制解决RAG中的信息孤岛与推理路径丢失问题。 large language model
63 From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms 提出LLM智能体记忆演进框架:从存储、反射到经验的三阶段范式 large language model
64 Prober.ai: Gated Inquiry-Based Feedback via LLM-Constrained Personas for Argumentative Writing Development 提出Prober.ai:基于门控式探究反馈与LLM约束角色的论证写作辅助系统,旨在缓解AI辅助写作带来的认知负债。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (27 篇)

#题目一句话要点标签🔗
65 Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning 提出面向协作多智能体强化学习的协调感知评估方法,解决传统指标的局限性。 reinforcement learning
66 Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key 提出ScaleLogic框架,研究逻辑表达能力对RL训练LLM长程推理的影响 reinforcement learning large language model
67 Learning to Cut: Reinforcement Learning for Benders Decomposition 提出基于强化学习的Benders分解方法,加速求解两阶段随机规划问题 reinforcement learning
68 Safactory: A Scalable Agent Factory for Trustworthy Autonomous Intelligence Safactory:用于可信自主智能的可扩展Agent工厂 reinforcement learning distillation
69 AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites 提出AGWM:一种基于可供性基础的世界模型,用于解决具有组合先决条件的复杂环境建模问题。 world model world models affordance
70 HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning 提出HaM-World模型,通过软哈密顿动力学与选择性记忆机制提升长程规划稳定性 world model world models latent dynamics
71 Multi-Objective Constraint Inference using Inverse reinforcement learning 提出多目标约束推理(MOCI)框架,解决异构专家演示下的约束与偏好联合学习问题。 reinforcement learning inverse reinforcement learning preference learning
72 BehaviorGuard: Online Backdoor Defense for Deep Reinforcement Learning 提出BehaviorGuard框架,通过监测动作分布偏移实现深度强化学习的在线后门防御 reinforcement learning deep reinforcement learning DRL
73 Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning 提出基于策略引导的逐步模型路由方法,以实现大语言模型推理的高效能与低成本平衡。 reinforcement learning large language model chain-of-thought
74 Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine 提出基于随机因果表征学习的sMMD方法,解决个性化医疗中的偏差-精度悖论 representation learning large language model
75 Mitigating Cognitive Bias in RLHF by Altering Rationality 提出基于动态理性参数调整的RLHF方法,以缓解人类反馈中的认知偏差问题 reinforcement learning RLHF
76 Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Skill1:通过强化学习统一进化技能增强型智能体,解决技能选择、利用和提炼的协同优化问题。 reinforcement learning distillation
77 PREFER: Personalized Review Summarization with Online Preference Learning 提出PREFER在线偏好学习框架,实现针对用户动态需求的个性化评论摘要生成 preference learning
78 Transformers Provably Implement In-Context Reinforcement Learning with Policy Improvement 证明Transformer可通过参数构造实现上下文强化学习并提供收敛性保证 reinforcement learning
79 Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration 提出LoPE框架:通过提示词空间扰动解决大模型强化学习中的零优势问题 reinforcement learning large language model
80 Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight 提出行为线索推理(BCR)框架,通过显式标记提升大模型推理过程的可监控性与安全性 reinforcement learning large language model
81 Agentick: A Unified Benchmark for General Sequential Decision-Making Agents 提出Agentick基准测试框架,实现对强化学习与大模型智能体在序列决策任务上的统一评估。 PPO foundation model
82 Randomness is sometimes necessary for coordination 提出Diamond Attention机制,通过引入结构化随机性解决多智能体强化学习中的角色分化难题 reinforcement learning zero-shot transfer
83 Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs 提出ASTOR框架,通过效用引导的多任务强化学习提升代码大模型性能 reinforcement learning
84 Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning 提出基于新颖性度量的思维树搜索方法,以优化大语言模型的推理与规划效率 reinforcement learning chain-of-thought
85 AGPO: Asymmetric Group Policy Optimization for Verifiable Reasoning and Search Ads Relevance at JD 提出非对称组策略优化(AGPO)算法,解决大模型强化学习中的推理边界收缩问题 reinforcement learning large language model
86 SDFlow: Similarity-Driven Flow Matching for Time Series Generation 提出SDFlow框架:利用相似度驱动的流匹配技术实现高效长序列时间序列生成 flow matching
87 SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs 提出SPARK框架:利用知识图谱实现非对称奖励的自我博弈,提升科学文献的多跳推理能力 reinforcement learning multimodal
88 P-Guide: Parameter-Efficient Prior Steering for Single-Pass CFG Inference 提出P-Guide框架:通过初始潜空间调制实现单次推理的无分类器引导(CFG) flow matching classifier-free guidance
89 X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning 提出X-Voice:基于两阶段流匹配训练的0.4B参数多语言零样本语音克隆模型 flow matching classifier-free guidance
90 Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence 提出Safactory框架,构建可扩展的智能体工厂以实现可信自主智能的闭环演进 reinforcement learning distillation
91 OPSD Compresses What RLVR Teaches: A Post-RL Compaction Stage for Reasoning Models 提出OPSD后训练压缩阶段,通过在RLVR后对推理模型进行蒸馏以缩短响应长度 reinforcement learning distillation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
92 SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting SpatialEpiBench:构建空间流行病预测基准,揭示现有方法在实际应用中的局限性 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
93 Uneven Evolution of Cognition Across Generations of Generative AI Models 提出基于心理测量学的AIQ基准,揭示生成式AI模型认知能力演进的非均衡性与架构偏差 manipulation multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
94 Narrow Secret Loyalty Dodges Black-Box Audits 提出窄域秘密忠诚攻击模型,揭示大语言模型在黑盒审计下的隐蔽性威胁 affordance

⬅️ 返回 cs.AI 首页 · 🏠 返回主页