cs.CL(2026-02-03)
📊 共 34 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (22 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (11 🔗2)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (22 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective | 提出基于信息论的抗蒸馏大语言模型方法,防御Logit蒸馏攻击。 | distillation large language model | ||
| 24 | CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning | 提出CPMobius,一种用于数据自由强化学习的迭代教练-队员推理框架 | reinforcement learning large language model | ||
| 25 | TRE: Encouraging Exploration in the Trust Region | 提出信任域熵(TRE)方法,解决LLM中探索失效问题,提升数学推理、组合搜索和偏好对齐任务性能。 | reinforcement learning PPO large language model | ✅ | |
| 26 | Learning to Reason Faithfully through Step-Level Faithfulness Maximization | 提出FaithRL框架,通过最大化步骤级忠实度提升LLM推理能力,减少幻觉。 | reinforcement learning reward design large language model | ✅ | |
| 27 | Verified Critical Step Optimization for LLM Agents | 提出关键步骤优化(CSO)方法,提升LLM Agent在复杂任务中的表现 | preference learning DPO large language model | ||
| 28 | $V_0$: A Generalist Value Model for Any Policy at State Zero | 提出V0通用价值模型,无需更新参数即可评估任意策略在初始状态的性能,用于LLM训练和部署。 | PPO large language model | ||
| 29 | ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term Contribution | ForesightKV:通过学习长期贡献优化推理模型KV缓存淘汰 | reinforcement learning large language model | ||
| 30 | ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuning | 提出对齐对比学习(ACL)框架,提升BERT和多出口BERT微调性能 | contrastive learning | ||
| 31 | One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social Intelligence | 提出OMAR:基于多智能体自博弈强化学习的通用对话社交智能模型 | reinforcement learning | ||
| 32 | Test-time Recursive Thinking: Self-Improvement without External Feedback | 提出测试时递归思考(TRT),实现大语言模型无需额外训练的自我提升。 | reinforcement learning large language model | ||
| 33 | ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution | ReMiT:强化学习引导的LLM中期训练,实现迭代式模型进化 | reinforcement learning large language model |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 34 | Controlling Output Rankings in Generative Engines for LLM-based Search | 提出CORE方法,通过优化检索内容控制LLM搜索的输出排序,提升小商家产品曝光 | manipulation large language model |