cs.LG（2025-09-26）

📊 共 11 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (6) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces	提出基于离散扩散策略的强化学习方法，解决组合动作空间问题	reinforcement learning diffusion policy
2	Adaptive Margin RLHF via Preference over Preferences	提出DPO-PoP，利用偏好间的偏好信息自适应调整边际，提升RLHF的泛化性和对齐。	reinforcement learning RLHF DPO
3	Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning	SPEAR：基于自模仿学习和渐进探索的Agentic强化学习方法	reinforcement learning imitation learning reward shaping
4	Adaptive Dual-Mode Distillation with Incentive Schemes for Scalable, Heterogeneous Federated Learning on Non-IID Data	提出自适应双模式蒸馏与激励机制，解决非独立同分布数据下异构联邦学习的可扩展性问题。	distillation
5	RLP: Reinforcement as a Pretraining Objective	提出RLP：一种将强化学习作为预训练目标的方法，提升模型推理能力。	reinforcement learning chain-of-thought
6	EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning	提出EPO算法，解决LLM Agent在多轮稀疏奖励强化学习中的探索-利用级联失效问题	reinforcement learning

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
7	ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation	提出ReLAM，通过预测模型学习视觉机器人操作的奖励函数	manipulation reinforcement learning reward design
8	A Framework for Scalable Heterogeneous Multi-Agent Adversarial Reinforcement Learning in IsaacLab	扩展IsaacLab框架，实现异构多智能体对抗强化学习的可扩展训练	manipulation reinforcement learning	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
9	OptiMind: Teaching LLMs to Think Like Optimization Experts	OptiMind：教LLM像优化专家一样思考，提升混合整数线性规划建模精度	large language model
10	SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights	SINQ：通过Sinkhorn归一化量化低精度LLM权重，无需校准。	large language model	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery	提出SPS-GAN，用于多系统轨迹生成和对称性发现，无需先验知识且性能媲美单系统监督模型。	physically plausible

⬅️ 返回 cs.LG 首页 · 🏠 返回主页