cs.LG(2026-05-04)

📊 共 22 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱七:动作重定向 (Motion Retargeting) (2) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
1 Efficient Preference Poisoning Attack on Offline RLHF 提出高效偏好投毒攻击方法,针对离线RLHF中的DPO算法 reinforcement learning offline RL offline reinforcement learning
2 Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning 提出动态大小推理块以解决固定块生成的局限性 reinforcement learning large language model
3 Gradient-Gated DPO: Stabilizing Preference Optimization in Language Models 提出Gradient-Gated DPO,稳定语言模型偏好优化过程,缓解概率坍塌问题 reinforcement learning DPO direct preference optimization
4 Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability 提出基于循环深度强化学习的化疗控制方法,提升部分可观测性下的治疗效果 reinforcement learning deep reinforcement learning TD3
5 Combining Trained Models in Reinforcement Learning 对深度强化学习中预训练模型复用方法进行系统性综述,分析其有效性和局限性。 reinforcement learning deep reinforcement learning DRL
6 Federated Reinforcement Learning for Efficient Mobile Crowdsensing under Incomplete Information 提出FDRL-PPO算法,解决移动群智感知中信息不完备下的高效任务参与问题。 reinforcement learning deep reinforcement learning PPO
7 Closed-Loop CO2 Storage Control With History-Based Reinforcement Learning and Latent Model-Based Adaptation 提出基于历史信息的强化学习与潜变量模型自适应的CO2地质封存闭环控制方法 reinforcement learning latent dynamics teacher-student
8 A Meta Reinforcement Learning Approach to Goals-Based Wealth Management 提出基于元强化学习的财富管理方法,快速解决个性化投资组合优化问题 reinforcement learning foundation model
9 Statistical Consistency and Generalization of Contrastive Representation Learning 提出统一统计学习理论以解决对比表示学习的统计一致性问题 representation learning foundation model
10 Evaluating Tabular Representation Learning for Network Intrusion Detection 评估表格数据表示学习在网络入侵检测中的应用 representation learning
11 Middle-mile logistics through the lens of goal-conditioned reinforcement learning 提出基于目标条件强化学习的中间一英里物流优化方法 reinforcement learning
12 Binary Rewards and Reinforcement Learning: Fundamental Challenges 提出KL控制以解决二元奖励下的多样性崩溃问题 reinforcement learning
13 A decoupled diffusion planner that adapts to changing cost limits by using cost-conditioned generation for safety and reward gradients for performance 提出SDGD,通过解耦扩散规划适应变化的安全约束,提升离线安全强化学习性能。 reinforcement learning classifier-free guidance

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
14 Bolek: A Multimodal Language Model for Molecular Reasoning Bolek:一种用于分子推理的多模态语言模型,提升可审计性。 multimodal chain-of-thought
15 Statistically-Lossless Quantization of Large Language Models 提出SLQ,实现大语言模型在任务和分布上统计无损的量化压缩。 large language model
16 LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning LUMINA-Bench:用于交流最优潮流代理模型学习的网格基础模型基准测试 foundation model
17 Pretraining on Sleep Data Improves non-Sleep Biosignal Tasks 利用睡眠数据预训练提升非睡眠生物信号任务性能 foundation model multimodal
18 SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection 提出SpecKV以优化大语言模型推理中的推测解码 large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
19 Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs 揭示多模态大模型中视觉隐变量的潜在推理能力,提出隐变量优化方法 latent optimization multimodal chain-of-thought
20 HELIX: Hybrid Encoding with Learnable Identity and Cross-dimensional Synthesis for Time Series Imputation HELIX:融合可学习身份编码和跨维度合成的混合编码时间序列插补方法 spatial relationship

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC 提出MPCS神经可塑性持续学习系统,通过多组件可塑性和拓扑感知EWC实现知识的持续积累。 MPC

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
22 ZNO: Stable Rational Neural Operators in the Z-Domain for Discrete-Time Dynamic 提出Z域神经算子(ZNO),用于稳定地学习离散时间动态系统。 PULSE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页