cs.CL(2026-05-21)
📊 共 18 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (11 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention | 提出Faithful-MR1框架,通过锚定和强化视觉注意力提升多模态推理的忠实性。 | reinforcement learning large language model multimodal | ||
| 13 | DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA | 提出DeferMem,通过强化学习进行查询时证据提炼,解决长时记忆问答问题 | reinforcement learning distillation large language model | ||
| 14 | LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance | LANG框架通过语言自适应提示指导,提升多语言推理强化学习效果 | reinforcement learning large language model language conditioned | ||
| 15 | Token-weighted Direct Preference Optimization with Attention | 提出Token加权DPO方法AttentionPO,利用LLM注意力机制提升偏好优化效果 | DPO direct preference optimization large language model | ||
| 16 | Self-Policy Distillation via Capability-Selective Subspace Projection | 提出基于能力选择子空间投影的自策略蒸馏方法,提升LLM泛化能力。 | distillation large language model | ||
| 17 | Unified Data Selection for LLM Reasoning | 提出基于高熵和(HES)的无训练数据选择方法,提升LLM推理能力。 | reinforcement learning large language model |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Reducing Political Manipulation with Consistency Training | 提出政治一致性训练,以减少大型语言模型中的政治操纵 | manipulation metric depth large language model |