cs.LG(2026-01-26)

📊 共 30 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (15 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (13 🔗2) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (15 篇)

#题目一句话要点标签🔗
1 Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models 提出On-Policy自蒸馏框架,提升大语言模型在数学推理任务上的token效率。 reinforcement learning distillation privileged information
2 Learned harmonic mean estimation of the marginal likelihood for multimodal posteriors with flow matching 提出基于Flow Matching的调和平均估计器,提升多峰后验分布下边缘似然的计算精度。 flow matching multimodal
3 POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration POPE:通过特权On-Policy探索学习解决复杂推理问题 reinforcement learning privileged information large language model
4 Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning 提出基于逆Fisher信息矩阵秩-1近似的自然策略梯度方法,加速深度强化学习。 reinforcement learning deep reinforcement learning
5 Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates 提出JitRL,无需梯度更新实现LLM Agent的即时强化学习,提升持续学习能力。 reinforcement learning large language model
6 TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment 提出TriPlay-RL,通过三方自博弈强化学习提升LLM安全性对齐。 reinforcement learning large language model
7 FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning 提出FP8-RL,通过低精度推理加速LLM强化学习并保持训练稳定性。 reinforcement learning large language model
8 Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions MoReBRAC:通过可信合成数据提升离线强化学习在机器人领域的鲁棒性 reinforcement learning offline reinforcement learning world model
9 Multi-Objective Reinforcement Learning for Efficient Tactical Decision Making for Trucks in Highway Traffic 提出基于多目标强化学习的卡车高速公路行驶策略优化方法 reinforcement learning
10 ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule 提出ART-RL,通过强化学习优化扩散模型采样的时间步长调度,提升生成质量。 reinforcement learning
11 CASSANDRA: Programmatic and Probabilistic Learning and Inference for Stochastic World Modeling CASSANDRA:利用LLM进行程序化和概率学习,构建随机世界模型 world model
12 Learning long term climate-resilient transport adaptation pathways under direct and indirect flood impacts using reinforcement learning 提出基于强化学习的气候适应性交通长期规划方法,应对洪水直接和间接影响。 reinforcement learning
13 K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents K-Myriad:利用无监督并行智能体启动强化学习,提升探索效率。 reinforcement learning
14 Enhance the Safety in Reinforcement Learning by ADRC Lagrangian Methods 提出ADRC-Lagrangian方法,提升强化学习安全性并减少振荡 reinforcement learning
15 Enhancing Control Policy Smoothness by Aligning Actions with Predictions from Preceding States 提出ASAP方法,通过对齐动作与前序状态预测,提升强化学习控制策略的平滑性 reinforcement learning deep reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)

#题目一句话要点标签🔗
16 Mechanistic Analysis of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning 揭示大语言模型持续微调中灾难性遗忘的内在机制 large language model
17 HeterCSI: Channel-Adaptive Heterogeneous CSI Pretraining Framework for Generalized Wireless Foundation Models HeterCSI:面向通用无线基础模型的信道自适应异构CSI预训练框架 foundation model
18 From LLMs to LRMs: Rethinking Pruning for Reasoning-Centric Models 针对推理增强型LLM,提出更有效的模型剪枝策略,显著提升推理性能。 large language model instruction following
19 Closing the Modality Gap Aligns Group-Wise Semantics 提出一种对齐组级别语义的方法,缩小多模态学习中的模态差距,提升聚类等组级别任务性能。 multimodal
20 PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation PRECISE:利用预测驱动的排序估计降低LLM评估的偏差,提升检索系统评估的可靠性。 large language model
21 Beyond Preferences: Learning Alignment Principles Grounded in Human Reasons and Values 提出GCAI框架,通过融合用户理由和价值观生成更优的AI对齐原则宪法 large language model
22 HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs HalluGuard:通过NTK几何学统一检测LLM中的数据驱动和推理驱动幻觉 large language model
23 From Fuzzy to Exact: The Halo Architecture for Infinite-Depth Reasoning via Rational Arithmetic 提出Halo架构以解决深度推理中的数值精度问题 large language model
24 FGGM: Fisher-Guided Gradient Masking for Continual Learning 提出FGGM,利用Fisher信息引导梯度掩码,缓解大语言模型持续学习中的灾难性遗忘。 large language model
25 Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs 提出OSW方法,解决LLM持续学习中结构安全与可塑性的平衡难题 large language model
26 AttenMIA: LLM Membership Inference Attack through Attention Signals AttenMIA:利用注意力信号的大语言模型成员推断攻击 large language model
27 LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts LatentMoE:面向最优精度/FLOP和参数效率的混合专家模型 large language model
28 DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal 提出DRPG框架,用于自动生成学术反驳,显著提升反驳质量并超越人类平均水平。 large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
29 Trust, Don't Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback TriTrust-PBRL:通过多专家反馈,实现对对抗性偏好数据的鲁棒偏好强化学习 locomotion manipulation reinforcement learning

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
30 Data-Driven Qubit Characterization and Optimal Control using Deep Learning 提出基于深度学习的量子比特表征与最优控制方法,提升量子门保真度 PULSE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页