cs.AI(2026-01-29)

📊 共 55 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (37 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (15 🔗3) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (37 篇)

#题目一句话要点标签🔗
1 SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding 提出SONIC-O1:一个用于评估多模态大语言模型音视频理解能力的真实世界基准 large language model multimodal
2 Chain Of Thought Compression: A Theoritical Analysis 提出ALiCoT框架,通过对齐隐式推理状态,提升大语言模型推理效率并保持性能。 large language model chain-of-thought
3 Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization 提出PLaT:解耦推理与表达的潜在思维链规划框架 large language model chain-of-thought
4 ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models ToolWeaver:通过编织协作语义实现大语言模型中可扩展的工具使用 large language model
5 Looking Beyond Accuracy: A Holistic Benchmark of ECG Foundation Models 提出心电图(ECG)基础模型的全面基准测试框架,超越传统准确率评估 foundation model
6 CORE:Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of Large Language Model Agents Over Hierarchical Edge CORE:通过分层边缘上LLM智能体协同编排,实现无处不在的6G智能 large language model
7 Moral Outrage Shapes Commitments Beyond Attention: Multimodal Moral Emotions on YouTube in Korea and the US 提出多模态道德情感分类器,揭示YouTube新闻中道德愤怒如何驱动用户参与。 multimodal
8 Assessing the Business Process Modeling Competences of Large Language Models 提出BEF4LLM框架,评估大语言模型在业务流程建模中的能力 large language model
9 The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation 通过风格向量操控大型语言模型:一项人类评估研究 large language model
10 LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph Learning 提出基于Clifford代数的LION模型,用于多模态属性图学习中的对齐与融合。 multimodal
11 TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models 提出TeachBench:一个基于教学大纲评估大语言模型教学能力的框架 large language model
12 EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation EHR-RAG:通过增强检索增强生成,弥合长程结构化电子病历与大型语言模型之间的差距 large language model
13 Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks 分析LLM在创意任务中提示与模型选择对输出方差的影响 large language model
14 Industrialized Deception: The Collateral Effects of LLM-Generated Misinformation on Digital Ecosystems 提出JudgeGPT和RogueGPT平台,研究LLM生成虚假信息对数字生态的影响及应对策略 large language model multimodal
15 Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities 提出DeR2基准,解耦检索与推理能力,评估大语言模型在科学信息上的推理能力。 large language model foundation model
16 TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning 提出TCAP,用于无监督检测多模态大语言模型微调中的后门攻击。 large language model multimodal
17 Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores 提出面向餐饮和零售场景的领域专家多模态大语言模型Ostrakon-VL large language model multimodal
18 Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning 提出认知复杂度基准CCB与Financial-PoT框架,提升LLM在金融量化推理中的鲁棒性 large language model chain-of-thought
19 Learning to Communicate Across Modalities: Perceptual Heterogeneity in Multi-Agent Systems 研究异构多智能体系统中的跨模态通信,解决感知差异下的信息传递问题 multimodal
20 SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents SWE-Replay:为软件工程Agent提供高效的测试时扩展方法 large language model
21 RedSage: A Cybersecurity Generalist LLM RedSage:一个面向网络安全的通用LLM,通过领域自适应预训练和智能体增强实现卓越性能。 instruction following
22 CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty CAR-bench:评估LLM智能体在真实不确定性下的可靠性与能力边界 large language model
23 AgenticSimLaw: A Juvenile Courtroom Multi-Agent Debate Simulation for Explainable High-Stakes Tabular Decision Making AgenticSimLaw:用于可解释高风险表格决策的青少年法庭多智能体辩论模拟 chain-of-thought
24 astra-langchain4j: Experiences Combining LLMs and Agent Programming 探索LLM与Agent编程融合:基于ASTRA语言的Langchain4j集成实践 large language model
25 KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement KnowBias:通过增强偏见知识神经元缓解大型语言模型中的社会偏见 large language model
26 EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference EWSJF:一种混合负载LLM推理的自适应混合分区调度器 large language model
27 E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory 提出E-mem,通过多智能体情景重建增强LLM Agent的记忆能力,提升复杂推理性能。 large language model
28 FBS: Modeling Native Parallel Reading inside a Transformer 提出FBS Transformer,通过模拟人类阅读机制提升LLM推理效率。 large language model
29 CORE: Collaborative Reasoning via Cross Teaching 提出CORE:通过交叉教学实现协同推理,提升大语言模型解题能力 large language model
30 Meta Context Engineering via Agentic Skill Evolution 提出Meta Context Engineering,通过智能体技能进化优化大语言模型上下文工程。 large language model
31 ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory 提出ShardMemo以解决大规模语言模型的内存瓶颈问题 large language model
32 LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI LLaMEA-SAGE:利用可解释AI的结构化反馈指导自动算法设计 large language model
33 The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus PoLR:利用前缀一致性引导LLM推理,提升效率并保持准确性 large language model
34 Adaptive Confidence Gating in Multi-Agent Collaboration for Efficient and Optimized Code Generation 提出 DebateCoder,利用多智能体协作和自适应置信门控提升小模型代码生成能力。 large language model
35 ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design ChipBench:用于评估LLM在AI辅助芯片设计中性能的新基准 large language model
36 NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents NEMO:通过自主编码代理实现执行感知的优化建模 large language model
37 More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests 研究表明AI生成的Pull Request代码质量较低,但评审者情绪更积极 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (15 篇)

#题目一句话要点标签🔗
38 Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance 提出基于多视角思维链蒸馏的电商搜索相关性建模方法,提升长尾查询效果。 DPO direct preference optimization distillation
39 MAR: Efficient Large Language Models via Module-aware Architecture Refinement 提出MAR框架,通过模块感知架构优化实现高效大语言模型 SSM state space model distillation
40 Self-Compression of Chain-of-Thought via Multi-Agent Reinforcement Learning 提出基于多智能体强化学习的思维链自压缩方法,提升大模型推理效率与精度。 reinforcement learning chain-of-thought
41 MEIDNet: Multimodal generative AI framework for inverse materials design MEIDNet:用于逆材料设计的生成式多模态AI框架 contrastive learning curriculum learning multimodal
42 SIA: Symbolic Interpretability for Anticipatory Deep Reinforcement Learning in Network Control SIA:用于网络控制中预测性深度强化学习的符号可解释性框架 reinforcement learning deep reinforcement learning DRL
43 The Patient is not a Moving Document: A World Model Training Paradigm for Longitudinal EHR 提出SMB-Structure,用于模拟纵向电子病历中的患者动态,提升疾病轨迹预测能力。 world model large language model foundation model
44 SymbXRL: Symbolic Explainable Deep Reinforcement Learning for Mobile Networks SymbXRL:面向移动网络的符号化可解释深度强化学习 reinforcement learning deep reinforcement learning DRL
45 Beyond Imitation: Reinforcement Learning for Active Latent Planning 提出ATP-Latent,通过强化学习优化LLM的隐空间推理,提升CoT推理效率。 reinforcement learning large language model chain-of-thought
46 EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots EmboCoach-Bench:评估LLM自主设计具身机器人策略的基准 diffusion policy reward design reward shaping
47 World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems WoW:企业级工作流环境基准测试,评估世界模型在复杂系统中的应用 world model large language model
48 ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation ProRAG:面向检索增强生成的过程监督强化学习框架 reinforcement learning preference learning reward shaping
49 Do Reasoning Models Enhance Embedding Models? 研究表明,基于RLVR训练的推理模型并不能显著提升Embedding模型的性能。 reinforcement learning contrastive learning large language model
50 Language-based Trial and Error Falls Behind in the Era of Experience 提出SCOUT框架,解耦探索与利用,提升LLM在非语言环境中的试错能力 reinforcement learning large language model
51 WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents WebArbiter:一种面向Web Agent的基于原则推理的过程奖励模型 reinforcement learning distillation
52 Bridging Forecast Accuracy and Inventory KPIs: A Simulation-Based Software Framework 提出基于仿真的软件框架,弥合预测精度与库存KPI之间的差距,优化汽车售后备件管理。 predictive model MAE

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
53 White-Box Op-Amp Design via Human-Mimicking Reasoning 提出White-Op,通过模仿人类推理设计可解释的运算放大器参数。 AMP large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
54 From Particles to Agents: Hallucination as a Metric for Cognitive Friction in Spatial Simulation 提出Agentic环境模拟,利用AI幻觉度量空间模拟中的认知摩擦。 affordance multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
55 The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making 揭示大语言模型在决策中“鲁棒性悖论”:逻辑与情感干扰的解耦 manipulation large language model

⬅️ 返回 cs.AI 首页 · 🏠 返回主页