cs.LG（2025-12-18）

📊 共 17 篇论文

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (8) 支柱九：具身大模型 (Embodied Foundation Models) (7) 支柱一：机器人控制 (Robot Control) (2)

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Non-Asymptotic Global Convergence of PPO-Clip	提出PPO-Clip算法的非渐近全局收敛性分析	reinforcement learning PPO RLHF
2	Yahtzee: Reinforcement Learning Techniques for Stochastic Combinatorial Games	针对随机组合游戏Yahtzee，提出基于策略梯度的强化学习方法。	reinforcement learning PPO
3	GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning	提出GB-DQN，通过梯度提升解决非平稳强化学习中的模型漂移问题	reinforcement learning deep reinforcement learning
4	Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs	提出Turn-PPO，通过回合级别优势估计改进Agentic LLM中的多轮强化学习	reinforcement learning PPO
5	Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game	提出Stackelberg Learning from Human Feedback (SLHF)框架，用于偏好优化。	reinforcement learning RLHF large language model
6	Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward	通过裁剪、熵和虚假奖励重新思考RLVR，提升LLM数学推理能力	reinforcement learning large language model
7	Meta-RL Induces Exploration in Language Agents	LaMer：基于元强化学习提升语言Agent在复杂环境中的探索能力	reinforcement learning large language model
8	On The Hidden Biases of Flow Matching Samplers	揭示Flow Matching采样器中的隐式偏差，分析其非最优传输特性	flow matching

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Can Large Reasoning Models Improve Accuracy on Mathematical Tasks Using Flawed Thinking?	通过有缺陷推理训练，提升大模型在数学任务中的容错性和准确性	large language model chain-of-thought
10	Managing the Stochastic: Foundations of Learning in Neuro-Symbolic Systems for Software Engineering	提出双状态架构，提升神经符号系统在软件工程中代码生成的可靠性	large language model instruction following
11	Dynamic Tool Dependency Retrieval for Efficient Function Calling	提出动态工具依赖检索(DTDR)，提升函数调用Agent效率与准确率	large language model
12	Impacts of Racial Bias in Historical Training Data for News AI	揭示新闻AI中历史数据偏见：以纽约时报语料库为例，分析种族标签的影响。	large language model
13	DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI	DataFlow：一个LLM驱动的统一数据准备与工作流自动化框架	large language model
14	Muon is Provably Faster with Momentum Variance Reduction	提出动量方差减少的Muon优化器以提升深度学习训练效率	large language model
15	A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection	系统性研究代码混淆对基于LLM的漏洞检测的影响，揭示其有效性和局限性。	large language model

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making	提出自适应分层RL-MPC方法，解决复杂规划问题中的样本效率低下问题	MPC reinforcement learning
17	Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning	提出后验行为克隆(PostBC)方法，提升强化学习微调的预训练策略效果	manipulation reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页