cs.LG（2026-01-26）

📊 共 30 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (15 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (13 🔗2) 支柱一：机器人控制 (Robot Control) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models	提出On-Policy自蒸馏框架，提升大语言模型在数学推理任务上的token效率。	reinforcement learning distillation privileged information
2	Learned harmonic mean estimation of the marginal likelihood for multimodal posteriors with flow matching	提出基于Flow Matching的调和平均估计器，提升多峰后验分布下边缘似然的计算精度。	flow matching multimodal
3	POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration	POPE：通过特权On-Policy探索学习解决复杂推理问题	reinforcement learning privileged information large language model
4	Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning	提出基于逆Fisher信息矩阵秩-1近似的自然策略梯度方法，加速深度强化学习。	reinforcement learning deep reinforcement learning
5	Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates	提出JitRL，无需梯度更新实现LLM Agent的即时强化学习，提升持续学习能力。	reinforcement learning large language model	✅
6	TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment	提出TriPlay-RL，通过三方自博弈强化学习提升LLM安全性对齐。	reinforcement learning large language model
7	FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning	提出FP8-RL，通过低精度推理加速LLM强化学习并保持训练稳定性。	reinforcement learning large language model
8	Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions	MoReBRAC：通过可信合成数据提升离线强化学习在机器人领域的鲁棒性	reinforcement learning offline reinforcement learning world model
9	Multi-Objective Reinforcement Learning for Efficient Tactical Decision Making for Trucks in Highway Traffic	提出基于多目标强化学习的卡车高速公路行驶策略优化方法	reinforcement learning
10	ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule	提出ART-RL，通过强化学习优化扩散模型采样的时间步长调度，提升生成质量。	reinforcement learning
11	CASSANDRA: Programmatic and Probabilistic Learning and Inference for Stochastic World Modeling	CASSANDRA：利用LLM进行程序化和概率学习，构建随机世界模型	world model
12	Learning long term climate-resilient transport adaptation pathways under direct and indirect flood impacts using reinforcement learning	提出基于强化学习的气候适应性交通长期规划方法，应对洪水直接和间接影响。	reinforcement learning
13	K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents	K-Myriad：利用无监督并行智能体启动强化学习，提升探索效率。	reinforcement learning
14	Enhance the Safety in Reinforcement Learning by ADRC Lagrangian Methods	提出ADRC-Lagrangian方法，提升强化学习安全性并减少振荡	reinforcement learning
15	Enhancing Control Policy Smoothness by Aligning Actions with Predictions from Preceding States	提出ASAP方法，通过对齐动作与前序状态预测，提升强化学习控制策略的平滑性	reinforcement learning deep reinforcement learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Mechanistic Analysis of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning	揭示大语言模型持续微调中灾难性遗忘的内在机制	large language model
17	HeterCSI: Channel-Adaptive Heterogeneous CSI Pretraining Framework for Generalized Wireless Foundation Models	HeterCSI：面向通用无线基础模型的信道自适应异构CSI预训练框架	foundation model
18	From LLMs to LRMs: Rethinking Pruning for Reasoning-Centric Models	针对推理增强型LLM，提出更有效的模型剪枝策略，显著提升推理性能。	large language model instruction following	✅
19	Closing the Modality Gap Aligns Group-Wise Semantics	提出一种对齐组级别语义的方法，缩小多模态学习中的模态差距，提升聚类等组级别任务性能。	multimodal
20	PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation	PRECISE：利用预测驱动的排序估计降低LLM评估的偏差，提升检索系统评估的可靠性。	large language model
21	Beyond Preferences: Learning Alignment Principles Grounded in Human Reasons and Values	提出GCAI框架，通过融合用户理由和价值观生成更优的AI对齐原则宪法	large language model
22	HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs	HalluGuard：通过NTK几何学统一检测LLM中的数据驱动和推理驱动幻觉	large language model
23	From Fuzzy to Exact: The Halo Architecture for Infinite-Depth Reasoning via Rational Arithmetic	提出Halo架构以解决深度推理中的数值精度问题	large language model
24	FGGM: Fisher-Guided Gradient Masking for Continual Learning	提出FGGM，利用Fisher信息引导梯度掩码，缓解大语言模型持续学习中的灾难性遗忘。	large language model
25	Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs	提出OSW方法，解决LLM持续学习中结构安全与可塑性的平衡难题	large language model
26	AttenMIA: LLM Membership Inference Attack through Attention Signals	AttenMIA：利用注意力信号的大语言模型成员推断攻击	large language model
27	LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts	LatentMoE：面向最优精度/FLOP和参数效率的混合专家模型	large language model
28	DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal	提出DRPG框架，用于自动生成学术反驳，显著提升反驳质量并超越人类平均水平。	large language model	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
29	Trust, Don't Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback	TriTrust-PBRL：通过多专家反馈，实现对对抗性偏好数据的鲁棒偏好强化学习	locomotion manipulation reinforcement learning

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
30	Data-Driven Qubit Characterization and Optimal Control using Deep Learning	提出基于深度学习的量子比特表征与最优控制方法，提升量子门保真度	PULSE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页