cs.LG(2025-12-18)
📊 共 19 篇论文
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (11)
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱一:机器人控制 (Robot Control) (2)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Non-Asymptotic Global Convergence of PPO-Clip | 提出PPO-Clip算法的非渐近全局收敛性分析 | reinforcement learning PPO RLHF | ||
| 13 | Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game | 提出Stackelberg Learning from Human Feedback (SLHF)框架,用于偏好优化。 | reinforcement learning RLHF large language model | ||
| 14 | Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward | 通过裁剪、熵和虚假奖励重新思考RLVR,提升LLM推理能力 | reinforcement learning large language model | ||
| 15 | Meta-RL Induces Exploration in Language Agents | LaMer:基于元强化学习提升语言Agent在复杂环境中的探索能力 | reinforcement learning large language model | ||
| 16 | On The Hidden Biases of Flow Matching Samplers | 揭示Flow Matching采样器中的隐藏偏差,分析其能量次优性 | flow matching | ||
| 17 | NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning | 提出NDRL方法以解决棉花灌溉与氮肥施用的复杂性问题 | reinforcement learning |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation | AIMM:用于检测社交媒体影响的股票市场操纵的人工智能驱动多模态框架 | manipulation multimodal | ||
| 19 | Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning | 提出后验行为克隆(PostBC)方法,提升RL微调的预训练策略效果 | manipulation reinforcement learning |