cs.LG(2026-01-13)

📊 共 13 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8) 支柱九:具身大模型 (Embodied Foundation Models) (5)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies 提出反向流匹配(RFM)框架,统一扩散和流策略的在线强化学习训练。 reinforcement learning diffusion policy flow matching
2 Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts 提出时间依赖的深度强化学习方法,解决非遍历环境中策略次优问题 reinforcement learning deep reinforcement learning
3 Coverage Improvement and Fast Convergence of On-policy Preference Learning 提出覆盖改进原则,加速在线偏好学习语言模型对齐的收敛 preference learning DPO distillation
4 Structure Detection for Contextual Reinforcement Learning 提出SD-MBTL框架,通过在线结构检测提升上下文强化学习的泛化性能。 reinforcement learning zero-shot transfer
5 ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning ORBIT:面向可控多预算推理的On-policy探索-利用框架 reinforcement learning distillation chain-of-thought
6 Scalable Multiagent Reinforcement Learning with Collective Influence Estimation 提出基于集体影响估计网络(CIEN)的可扩展多智能体强化学习框架 reinforcement learning SAC
7 Provably Safe Reinforcement Learning using Entropy Regularizer 提出基于熵正则化的安全强化学习算法,提升学习过程中的安全性和稳定性 reinforcement learning
8 Your Group-Relative Advantage Is Biased 揭示群体相对优势估计偏差,提出HA-DW提升RLVR推理性能 reinforcement learning large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
9 Deep Exploration of Epoch-wise Double Descent in Noisy Data: Signal Separation, Large Activation, and Benign Overfitting 研究噪声数据下epoch-wise双下降现象,揭示泛化能力与内部激活模式关联 large language model
10 Asymptotic Universal Alignment: A New Alignment Framework via Test-Time Scaling 提出基于测试时缩放的渐近通用对齐框架,提升大语言模型对齐效果 large language model
11 Reducing Compute Waste in LLMs through Kernel-Level DVFS 提出基于内核级DVFS的LLM节能方法,在保证性能前提下显著降低计算浪费。 large language model
12 Sampling via Stochastic Interpolants by Langevin-based Velocity and Initialization Estimation in Flow ODEs 提出基于Langevin采样的随机插值概率流ODE方法,用于高效玻尔兹曼分布采样。 multimodal
13 Demystifying the Slash Pattern in Attention: The Role of RoPE 揭示LLM中Slash注意力模式:RoPE的角色与影响 large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页