cs.LG(2025-12-18)
📊 共 17 篇论文
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (8)
支柱九:具身大模型 (Embodied Foundation Models) (7)
支柱一:机器人控制 (Robot Control) (2)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Non-Asymptotic Global Convergence of PPO-Clip | 提出PPO-Clip算法的非渐近全局收敛性分析 | reinforcement learning PPO RLHF | ||
| 2 | Yahtzee: Reinforcement Learning Techniques for Stochastic Combinatorial Games | 针对随机组合游戏Yahtzee,提出基于策略梯度的强化学习方法。 | reinforcement learning PPO | ||
| 3 | GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning | 提出GB-DQN,通过梯度提升解决非平稳强化学习中的模型漂移问题 | reinforcement learning deep reinforcement learning | ||
| 4 | Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs | 提出Turn-PPO,通过回合级别优势估计改进Agentic LLM中的多轮强化学习 | reinforcement learning PPO | ||
| 5 | Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game | 提出Stackelberg Learning from Human Feedback (SLHF)框架,用于偏好优化。 | reinforcement learning RLHF large language model | ||
| 6 | Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward | 通过裁剪、熵和虚假奖励重新思考RLVR,提升LLM数学推理能力 | reinforcement learning large language model | ||
| 7 | Meta-RL Induces Exploration in Language Agents | LaMer:基于元强化学习提升语言Agent在复杂环境中的探索能力 | reinforcement learning large language model | ||
| 8 | On The Hidden Biases of Flow Matching Samplers | 揭示Flow Matching采样器中的隐式偏差,分析其非最优传输特性 | flow matching |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making | 提出自适应分层RL-MPC方法,解决复杂规划问题中的样本效率低下问题 | MPC reinforcement learning | ||
| 17 | Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning | 提出后验行为克隆(PostBC)方法,提升强化学习微调的预训练策略效果 | manipulation reinforcement learning |