cs.LG(2026-05-11)
📊 共 44 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (23 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (20 🔗1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (23 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (20 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning | 提出Impoola架构以解决深度强化学习中视觉分辨率缩放受限的问题 | reinforcement learning deep reinforcement learning policy learning | ✅ | |
| 25 | Robust Probabilistic Shielding for Safe Offline Reinforcement Learning | 提出鲁棒概率屏蔽方法,实现离线强化学习中的安全策略改进 | reinforcement learning offline RL offline reinforcement learning | ||
| 26 | Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning | 提出动态技能生命周期管理框架SLIM,优化智能体强化学习中的技能集演化 | reinforcement learning policy learning large language model | ||
| 27 | Balancing Efficiency and Fairness in Traffic Light Control through Deep Reinforcement Learning | 提出一种基于深度强化学习的交通信号控制方法,在提升通行效率的同时兼顾车辆与行人的公平性。 | reinforcement learning deep reinforcement learning | ||
| 28 | Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories | 提出Clin-JEPA多阶段协同训练框架,实现电子健康记录(EHR)患者轨迹的联合嵌入预测预训练。 | JEPA representation learning | ||
| 29 | MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization | 提出MASS-DPO:基于Fisher信息的主动负样本选择策略,优化多负样本偏好学习效率 | DPO direct preference optimization | ||
| 30 | PC3D: Zero-Shot Cooperation Across Variable Rosters via Personalized Context Distillation | 提出PC3D框架,通过个性化上下文蒸馏实现多智能体系统在变动规模下的零样本协作 | reinforcement learning distillation | ||
| 31 | Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis | 提出等变强化学习框架以实现Clifford量子电路的高效合成 | reinforcement learning | ||
| 32 | Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why | 提出一种无需训练的诊断框架,通过梯度对齐分析揭示策略内蒸馏在推理模型训练中的作用机制。 | distillation | ||
| 33 | Policy Gradient Methods for Non-Markovian Reinforcement Learning | 提出代理状态马尔可夫策略梯度(ASMPG)算法,解决非马尔可夫决策过程中的策略优化难题 | reinforcement learning | ||
| 34 | Locking Pretrained Weights via Deep Low-Rank Residual Distillation | 提出DLR-Lock方法,通过深度低秩残差蒸馏锁定预训练模型权重以防御恶意微调。 | distillation | ||
| 35 | Scalable Mamba-Based Message-Passing Neural Decoder for Error-Correcting Codes | 提出基于Mamba的消息传递神经译码器(MMPD),实现长码纠错的高效与可扩展性 | Mamba | ||
| 36 | Step Rejection Fine-Tuning: A Practical Distillation Recipe | 提出步级拒绝微调(SRFT)方法,通过细粒度损失掩码提升LLM智能体在复杂任务中的表现 | distillation | ||
| 37 | Controllability in preference-conditioned multi-objective reinforcement learning | 提出可控性评估指标以解决偏好条件多目标强化学习中的行为敏感度缺失问题 | reinforcement learning | ||
| 38 | PhysEDA: Physics-Aware Learning Framework for Efficient EDA With Manhattan Distance Decay | 提出PhysEDA框架:通过曼哈顿距离衰减先验实现高效EDA任务建模 | reinforcement learning linear attention reward shaping | ||
| 39 | Follow the Mean: Reference-Guided Flow Matching | 提出基于参考引导的流匹配(Reference-Guided Flow Matching)框架,实现无需微调的生成模型可控性。 | flow matching | ||
| 40 | When Does Non-Uniform Replay Matter in Reinforcement Learning? | 揭示非均匀经验回放的生效机制,提出截断几何采样策略以提升离线强化学习效率 | reinforcement learning | ||
| 41 | Unsupervised Process Reward Models | 提出无监督过程奖励模型(uPRM),通过概率评分机制实现无需人工标注的推理步骤评估。 | reinforcement learning large language model | ||
| 42 | Generating Symmetric Materials using Latent Flow Matching | 提出SymADiT:基于潜在流匹配与Wyckoff位置约束的对称性感知材料生成模型 | flow matching | ||
| 43 | Adaptive Action Chunking via Multi-Chunk Q Value Estimation | 提出自适应动作分块(ACH)算法,通过多块Q值估计实现动态动作序列长度调整。 | reinforcement learning imitation learning |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 44 | XQCfD: Accelerating Fast Actor-Critic Algorithms with Prior Data and Prior Policies | 提出XQCfD算法,通过预训练策略与增强回放机制提升机器人强化学习的样本效率。 | manipulation reinforcement learning |