cs.LG（2026-05-04）

📊 共 22 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (13 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱七：动作重定向 (Motion Retargeting) (2) 支柱一：机器人控制 (Robot Control) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Efficient Preference Poisoning Attack on Offline RLHF	提出高效偏好投毒攻击方法，针对离线RLHF中的DPO算法	reinforcement learning offline RL offline reinforcement learning
2	Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning	提出动态大小推理块以解决固定块生成的局限性	reinforcement learning large language model	✅
3	Gradient-Gated DPO: Stabilizing Preference Optimization in Language Models	提出Gradient-Gated DPO，稳定语言模型偏好优化过程，缓解概率坍塌问题	reinforcement learning DPO direct preference optimization
4	Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability	提出基于循环深度强化学习的化疗控制方法，提升部分可观测性下的治疗效果	reinforcement learning deep reinforcement learning TD3
5	Combining Trained Models in Reinforcement Learning	对深度强化学习中预训练模型复用方法进行系统性综述，分析其有效性和局限性。	reinforcement learning deep reinforcement learning DRL
6	Federated Reinforcement Learning for Efficient Mobile Crowdsensing under Incomplete Information	提出FDRL-PPO算法，解决移动群智感知中信息不完备下的高效任务参与问题。	reinforcement learning deep reinforcement learning PPO
7	Closed-Loop CO2 Storage Control With History-Based Reinforcement Learning and Latent Model-Based Adaptation	提出基于历史信息的强化学习与潜变量模型自适应的CO2地质封存闭环控制方法	reinforcement learning latent dynamics teacher-student
8	A Meta Reinforcement Learning Approach to Goals-Based Wealth Management	提出基于元强化学习的财富管理方法，快速解决个性化投资组合优化问题	reinforcement learning foundation model
9	Statistical Consistency and Generalization of Contrastive Representation Learning	提出统一统计学习理论以解决对比表示学习的统计一致性问题	representation learning foundation model
10	Evaluating Tabular Representation Learning for Network Intrusion Detection	评估表格数据表示学习在网络入侵检测中的应用	representation learning
11	Middle-mile logistics through the lens of goal-conditioned reinforcement learning	提出基于目标条件强化学习的中间一英里物流优化方法	reinforcement learning
12	Binary Rewards and Reinforcement Learning: Fundamental Challenges	提出KL控制以解决二元奖励下的多样性崩溃问题	reinforcement learning
13	A decoupled diffusion planner that adapts to changing cost limits by using cost-conditioned generation for safety and reward gradients for performance	提出SDGD，通过解耦扩散规划适应变化的安全约束，提升离线安全强化学习性能。	reinforcement learning classifier-free guidance

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Bolek: A Multimodal Language Model for Molecular Reasoning	Bolek：一种用于分子推理的多模态语言模型，提升可审计性。	multimodal chain-of-thought
15	Statistically-Lossless Quantization of Large Language Models	提出SLQ，实现大语言模型在任务和分布上统计无损的量化压缩。	large language model	✅
16	LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning	LUMINA-Bench：用于交流最优潮流代理模型学习的网格基础模型基准测试	foundation model
17	Pretraining on Sleep Data Improves non-Sleep Biosignal Tasks	利用睡眠数据预训练提升非睡眠生物信号任务性能	foundation model multimodal
18	SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection	提出SpecKV以优化大语言模型推理中的推测解码	large language model

🔬 支柱七：动作重定向 (Motion Retargeting) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs	揭示多模态大模型中视觉隐变量的潜在推理能力，提出隐变量优化方法	latent optimization multimodal chain-of-thought
20	HELIX: Hybrid Encoding with Learnable Identity and Cross-dimensional Synthesis for Time Series Imputation	HELIX：融合可学习身份编码和跨维度合成的混合编码时间序列插补方法	spatial relationship

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC	提出MPCS神经可塑性持续学习系统，通过多组件可塑性和拓扑感知EWC实现知识的持续积累。	MPC

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	ZNO: Stable Rational Neural Operators in the Z-Domain for Discrete-Time Dynamic	提出Z域神经算子(ZNO)，用于稳定地学习离散时间动态系统。	PULSE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页