cs.LG(2026-04-20)
📊 共 35 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (19 🔗4)
支柱二:RL算法与架构 (RL & Architecture) (13 🔗1)
支柱一:机器人控制 (Robot Control) (3)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (19 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought | 提出校准强化学习算法CAL-GRPO,解决多步CoT推理中的梯度偏差问题。 | reinforcement learning chain-of-thought | ||
| 21 | Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity | Sonata:一种用于临床数据稀缺下惯性运动学的混合世界模型 | world model world models representation learning | ||
| 22 | Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning | 提出基于正则化强化学习的动态中止框架,提升LLM推理效率与准确性。 | reinforcement learning large language model chain-of-thought | ||
| 23 | Fisher Decorator: Refining Flow Policy via A Local Transport Map | Fisher Decorator:通过局部传输映射优化基于流的离线强化学习策略 | reinforcement learning offline RL offline reinforcement learning | ✅ | |
| 24 | Efficient Federated RLHF via Zeroth-Order Policy Optimization | 提出Par-S²ZPO算法,解决联邦RLHF中资源受限Agent的效率问题 | reinforcement learning RLHF | ||
| 25 | When Can LLMs Learn to Reason with Weak Supervision? | 提出弱监督下的推理学习方法以提升LLM性能 | reinforcement learning large language model | ||
| 26 | Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data | 提出CUTS和Mixed-CUTS框架,解决强化学习中推理数据饱和导致的策略退化问题 | reinforcement learning | ||
| 27 | Neural Garbage Collection: Learning to Forget while Learning to Reason | 提出神经垃圾回收(NGC),通过端到端学习实现语言模型在推理过程中自主遗忘,提升效率。 | reinforcement learning chain-of-thought | ||
| 28 | HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment | 提出HEAL框架,通过混合域熵动态对齐增强少样本RLVR探索能力 | reinforcement learning large language model | ||
| 29 | LEPO: \underline{L}atent R\underline{e}asoning \underline{P}olicy \underline{O}ptimization for Large Language~Models | 提出LEPO,通过在隐空间进行强化学习,提升大语言模型的推理能力 | reinforcement learning large language model | ||
| 30 | The Umwelt Representation Hypothesis: Rethinking Universality | 提出Umwelt表征假说,质疑通用表征,强调生态约束对表征的影响 | world model world models | ||
| 31 | Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer | 提出Agentic Consensus,通过可治理的共识层提升人机协同编程的可控性与可审计性 | world model world models | ||
| 32 | Tool Learning Needs Nothing More Than a Free 8B Language Model | 提出TRUSTEE,利用8B开源语言模型训练工具调用Agent,无需额外数据。 | reinforcement learning curriculum learning |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 33 | Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study | 通过显式物理可行性约束提升VLA模型学习能力,解决机器人操作中的可靠性问题 | manipulation imitation learning vision-language-action | ||
| 34 | Bounded Ratio Reinforcement Learning | 提出有界比率强化学习框架(BRRL),弥合信任域方法与PPO启发式裁剪目标之间的差距。 | humanoid humanoid locomotion locomotion | ||
| 35 | Ranking Abuse via Strategic Pairwise Data Perturbations | 提出自适应子集选择攻击(ASSA)以研究基于MLE排序系统在对抗性扰动下的脆弱性 | manipulation |