cs.LG(2025-12-18)

📊 共 17 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8) 支柱九:具身大模型 (Embodied Foundation Models) (7) 支柱一:机器人控制 (Robot Control) (2)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 Non-Asymptotic Global Convergence of PPO-Clip 提出PPO-Clip算法的非渐近全局收敛性分析 reinforcement learning PPO RLHF
2 Yahtzee: Reinforcement Learning Techniques for Stochastic Combinatorial Games 针对随机组合游戏Yahtzee,提出基于策略梯度的强化学习方法。 reinforcement learning PPO
3 GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning 提出GB-DQN,通过梯度提升解决非平稳强化学习中的模型漂移问题 reinforcement learning deep reinforcement learning
4 Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs 提出Turn-PPO,通过回合级别优势估计改进Agentic LLM中的多轮强化学习 reinforcement learning PPO
5 Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game 提出Stackelberg Learning from Human Feedback (SLHF)框架,用于偏好优化。 reinforcement learning RLHF large language model
6 Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward 通过裁剪、熵和虚假奖励重新思考RLVR,提升LLM数学推理能力 reinforcement learning large language model
7 Meta-RL Induces Exploration in Language Agents LaMer:基于元强化学习提升语言Agent在复杂环境中的探索能力 reinforcement learning large language model
8 On The Hidden Biases of Flow Matching Samplers 揭示Flow Matching采样器中的隐式偏差,分析其非最优传输特性 flow matching

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
9 Can Large Reasoning Models Improve Accuracy on Mathematical Tasks Using Flawed Thinking? 通过有缺陷推理训练,提升大模型在数学任务中的容错性和准确性 large language model chain-of-thought
10 Managing the Stochastic: Foundations of Learning in Neuro-Symbolic Systems for Software Engineering 提出双状态架构,提升神经符号系统在软件工程中代码生成的可靠性 large language model instruction following
11 Dynamic Tool Dependency Retrieval for Efficient Function Calling 提出动态工具依赖检索(DTDR),提升函数调用Agent效率与准确率 large language model
12 Impacts of Racial Bias in Historical Training Data for News AI 揭示新闻AI中历史数据偏见:以纽约时报语料库为例,分析种族标签的影响。 large language model
13 DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI DataFlow:一个LLM驱动的统一数据准备与工作流自动化框架 large language model
14 Muon is Provably Faster with Momentum Variance Reduction 提出动量方差减少的Muon优化器以提升深度学习训练效率 large language model
15 A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection 系统性研究代码混淆对基于LLM的漏洞检测的影响,揭示其有效性和局限性。 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
16 Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making 提出自适应分层RL-MPC方法,解决复杂规划问题中的样本效率低下问题 MPC reinforcement learning
17 Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning 提出后验行为克隆(PostBC)方法,提升强化学习微调的预训练策略效果 manipulation reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页