cs.LG(2025-09-26)
📊 共 11 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces | 提出基于离散扩散策略的强化学习方法,解决组合动作空间问题 | reinforcement learning diffusion policy | ||
| 2 | Adaptive Margin RLHF via Preference over Preferences | 提出DPO-PoP,利用偏好间的偏好信息自适应调整边际,提升RLHF的泛化性和对齐。 | reinforcement learning RLHF DPO | ||
| 3 | Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning | SPEAR:基于自模仿学习和渐进探索的Agentic强化学习方法 | reinforcement learning imitation learning reward shaping | ||
| 4 | Adaptive Dual-Mode Distillation with Incentive Schemes for Scalable, Heterogeneous Federated Learning on Non-IID Data | 提出自适应双模式蒸馏与激励机制,解决非独立同分布数据下异构联邦学习的可扩展性问题。 | distillation | ||
| 5 | RLP: Reinforcement as a Pretraining Objective | 提出RLP:一种将强化学习作为预训练目标的方法,提升模型推理能力。 | reinforcement learning chain-of-thought | ||
| 6 | EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning | 提出EPO算法,解决LLM Agent在多轮稀疏奖励强化学习中的探索-利用级联失效问题 | reinforcement learning |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation | 提出ReLAM,通过预测模型学习视觉机器人操作的奖励函数 | manipulation reinforcement learning reward design | ||
| 8 | A Framework for Scalable Heterogeneous Multi-Agent Adversarial Reinforcement Learning in IsaacLab | 扩展IsaacLab框架,实现异构多智能体对抗强化学习的可扩展训练 | manipulation reinforcement learning | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | OptiMind: Teaching LLMs to Think Like Optimization Experts | OptiMind:教LLM像优化专家一样思考,提升混合整数线性规划建模精度 | large language model | ||
| 10 | SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights | SINQ:通过Sinkhorn归一化量化低精度LLM权重,无需校准。 | large language model | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery | 提出SPS-GAN,用于多系统轨迹生成和对称性发现,无需先验知识且性能媲美单系统监督模型。 | physically plausible |