cs.LG（2026-01-13）

📊 共 13 篇论文

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (8) 支柱九：具身大模型 (Embodied Foundation Models) (5)

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies	提出反向流匹配(RFM)框架，统一扩散和流策略的在线强化学习训练。	reinforcement learning diffusion policy flow matching
2	Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts	提出时间依赖的深度强化学习方法，解决非遍历环境中策略次优问题	reinforcement learning deep reinforcement learning
3	Coverage Improvement and Fast Convergence of On-policy Preference Learning	提出覆盖改进原则，加速在线偏好学习语言模型对齐的收敛	preference learning DPO distillation
4	Structure Detection for Contextual Reinforcement Learning	提出SD-MBTL框架，通过在线结构检测提升上下文强化学习的泛化性能。	reinforcement learning zero-shot transfer
5	ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning	ORBIT：面向可控多预算推理的On-policy探索-利用框架	reinforcement learning distillation chain-of-thought
6	Scalable Multiagent Reinforcement Learning with Collective Influence Estimation	提出基于集体影响估计网络(CIEN)的可扩展多智能体强化学习框架	reinforcement learning SAC
7	Provably Safe Reinforcement Learning using Entropy Regularizer	提出基于熵正则化的安全强化学习算法，提升学习过程中的安全性和稳定性	reinforcement learning
8	Your Group-Relative Advantage Is Biased	揭示群体相对优势估计偏差，提出HA-DW提升RLVR推理性能	reinforcement learning large language model

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Deep Exploration of Epoch-wise Double Descent in Noisy Data: Signal Separation, Large Activation, and Benign Overfitting	研究噪声数据下epoch-wise双下降现象，揭示泛化能力与内部激活模式关联	large language model
10	Asymptotic Universal Alignment: A New Alignment Framework via Test-Time Scaling	提出基于测试时缩放的渐近通用对齐框架，提升大语言模型对齐效果	large language model
11	Reducing Compute Waste in LLMs through Kernel-Level DVFS	提出基于内核级DVFS的LLM节能方法，在保证性能前提下显著降低计算浪费。	large language model
12	Sampling via Stochastic Interpolants by Langevin-based Velocity and Initialization Estimation in Flow ODEs	提出基于Langevin采样的随机插值概率流ODE方法，用于高效玻尔兹曼分布采样。	multimodal
13	Demystifying the Slash Pattern in Attention: The Role of RoPE	揭示LLM中Slash注意力模式：RoPE的角色与影响	large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页