cs.AI(2025-09-29)
📊 共 38 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (24 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (13)
支柱三:空间感知与语义 (Perception & Semantics) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (24 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning | Uni-NTFM:用于脑电信号表征学习的统一神经拓扑基础模型 | representation learning foundation model | ||
| 26 | RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment | RE-PO:一种通用的LLM对齐框架,通过鲁棒增强策略优化解决标签噪声问题 | reinforcement learning RLHF DPO | ||
| 27 | Training Agents Inside of Scalable World Models | Dreamer 4:通过可扩展世界模型在Minecraft中实现离线钻石获取 | reinforcement learning world model dreamer | ||
| 28 | Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models | 揭示大语言模型推理与检索的竞争机制,提出FARL提升推理能力 | reinforcement learning distillation chain-of-thought | ||
| 29 | RL in the Wild: Characterizing RLVR Training in LLM Deployment | 针对LLM部署中RLVR训练的系统挑战,提出PolyTrace基准测试套件。 | reinforcement learning large language model | ||
| 30 | Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks | 提出原则过程奖励以解决长轨迹任务中的反馈稀疏问题 | reinforcement learning large language model | ||
| 31 | Modeling Others' Minds as Code | ROTE:利用程序合成高效预测人类行为,提升人机协作 | behavior cloning large language model | ||
| 32 | DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search | DeepSearch:通过蒙特卡洛树搜索和可验证奖励克服强化学习瓶颈 | reinforcement learning | ||
| 33 | The Era of Real-World Human Interaction: RL from User Conversations | 提出基于用户对话的强化学习(RLHI),实现持续模型改进和多方面对齐。 | reinforcement learning instruction following | ||
| 34 | Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity | 提出数据推理强度(DRI)指标,优化训练数据以提升LLM逻辑推理能力。 | reinforcement learning large language model | ||
| 35 | Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention | 提出Intervened Preference Optimization以提升大型推理模型安全性 | preference learning chain-of-thought | ||
| 36 | Humanline: Online Alignment as Perceptual Loss | 提出Humanline,通过感知损失在线对齐,提升模型与人类偏好一致性 | PPO DPO | ||
| 37 | Unifying Agent Interaction and World Information for Multi-agent Coordination | 提出IWoL框架,统一交互与世界信息,促进多智能体协同 | reinforcement learning representation learning |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 38 | Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs | 提出基于LLM和类比文本描述的视觉-语言导航方法,提升场景理解和空间推理能力 | scene understanding embodied AI VLN |