cs.LG（2026-05-08）

📊 共 55 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (33 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (12 🔗1) 支柱一：机器人控制 (Robot Control) (5 🔗1) 支柱八：物理动画 (Physics-based Animation) (2) 支柱三：空间感知与语义 (Perception & Semantics) (1) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (33 篇)

#	题目	一句话要点	标签	🔗
1	HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents	提出HyperEyes：一种双粒度效率感知强化学习框架，实现并行多模态搜索代理	reinforcement learning distillation multimodal
2	ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression	提出ExpThink框架：通过经验引导的强化学习实现自适应思维链压缩	reinforcement learning reward shaping chain-of-thought
3	Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought	揭示思维链（CoT）在上下文强化学习（ICRL）中的收敛机制与涌现原理	reinforcement learning chain-of-thought
4	Interpreting Reinforcement Learning Agents with Susceptibilities	提出基于敏感度（Susceptibilities）的深度强化学习可解释性框架，揭示模型参数空间的演化机制。	reinforcement learning deep reinforcement learning RLHF
5	Prototype Guided Post-pretraining for Single-Cell Representation Learning	提出CellRefine后预训练框架，利用标记基因先验优化单细胞表征学习	representation learning large language model foundation model
6	Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning	提出PIQL框架：利用特权信息（PI）加速表格基础模型（TFMs）的训练并提升泛化能力	privileged information foundation model
7	Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation	提出轨迹塑形离散流匹配（TS-DFM）方法，通过能量导航蒸馏实现高效文本生成	flow matching distillation
8	Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning	揭示Softmax Transformer的ICRL机制：证明其等价于加权Softmax时序差分学习	reinforcement learning linear attention
9	Neurosymbolic Imitation Learning with Human Guidance: A Privileged Information Approach	提出基于特权信息的神经符号模仿学习框架，以提升复杂环境下的数据效率与泛化能力。	imitation learning privileged information
10	KL for a KL: On-Policy Distillation with Control Variate Baseline	提出vOPD方法：通过引入控制变量基线，解决在线策略蒸馏中的梯度方差不稳定问题。	distillation large language model
11	RelAgent: LLM Agents as Data Scientists for Relational Learning	提出RelAgent框架，利用大语言模型作为自主数据科学家解决关系型学习任务	predictive model large language model foundation model
12	SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion	SHRED：通过Logit降维的自蒸馏实现免retain-set的大语言模型知识遗忘	distillation large language model
13	TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models	提出TRACE框架：利用扩散与流匹配模型实现基于传输对齐的共形预测	flow matching multimodal
14	Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning	提出Prune-OPD框架，通过动态截断与奖励加权优化长程推理任务中的在线策略蒸馏	teacher-student distillation
15	Structured Coupling for Flow Matching	提出结构化耦合流匹配（SCFM），通过联合学习结构化潜变量与连续传输映射，实现生成质量与表征可解释性的平衡。	flow matching representation learning
16	Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States	提出POISE框架：利用策略模型内部状态进行价值估计，实现高效的大语言模型强化学习	reinforcement learning PPO
17	Rubric-based On-policy Distillation	提出基于准则的在线策略蒸馏框架ROPD，实现黑盒模型的高效对齐	teacher-student distillation	✅
18	Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control	提出Star Elastic训练框架，通过单次后训练实现嵌套子模型并支持推理阶段的动态预算控制。	SSM distillation large language model
19	Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR	提出自适应负强化学习（A-NSR）框架，通过动态惩罚策略提升LLM推理能力	reinforcement learning PPO large language model
20	Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs	提出基于价值的强化学习算法以解决指数效用优化问题	reinforcement learning
21	Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph	提出GraphDPO算法，通过偏好图建模优化语言模型对齐，解决成对偏好学习的局限性。	reinforcement learning DPO direct preference optimization
22	Debiased Counterfactual Generation via Flow Matching from Observations	提出基于流匹配的去偏反事实生成框架，通过利用观测数据分布提升反事实推断的准确性。	flow matching
23	A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning	提出针对极端多分类监督对比学习的精细化泛化分析框架，实现与类别分布无关的样本复杂度界限。	representation learning
24	StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models	提出StreamPhy框架，利用状态空间模型实现高维物理场动态的实时流式推断	state space model
25	Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models	提出互惠强化学习（MRL）框架，实现异构大语言模型间的经验共享与协同训练	reinforcement learning
26	Improved Model-based Reinforcement Learning with Smooth Kernels	提出基于平滑核的在线强化学习方法，通过Bernstein风格探索奖励优化遗憾界	reinforcement learning
27	Coupling Models for One-Step Discrete Generation	提出耦合模型（Coupling Models）以实现离散数据的高效一步生成	distillation large language model	✅
28	Stabilized neural Hamilton--Jacobi--Bellman solvers: Error analysis and applications in model-based reinforcement learning	提出稳定化的神经Hamilton-Jacobi-Bellman求解器，用于模型强化学习。	reinforcement learning
29	Where to Spend Rollouts: Hit-Utility Optimal Rollout Allocation for Group-Based RLVR	提出HORA算法：通过命中效用最优分配策略提升基于群组的RLVR推理效率	reinforcement learning large language model
30	Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift	通过Poisson-Moreau漂移，提升随机逼近和强化学习的几乎必然收敛速度	reinforcement learning
31	Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective	提出累积令牌策略优化（CTPO），通过累积重要性采样比解决LLM强化学习中的偏差-方差困境。	reinforcement learning PPO	✅
32	Theoretical Limits of Language Model Alignment	提出KL正则化的语言模型对齐理论极限以优化对齐效果	reinforcement learning PPO
33	Actor-Critic with Active Importance Sampling	提出主动重要性采样Actor-Critic（AISAC）算法，通过优化行为策略显著降低梯度估计方差。	reinforcement learning TD3

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

#	题目	一句话要点	标签	🔗
34	Pre-trained Tabular Foundation Models as Versatile Summary Networks for Neural Posterior Estimation	提出PFN-NPE框架，利用预训练表格基础模型作为通用摘要网络实现神经后验估计	foundation model
35	The Coupling Tax: How Shared Token Budgets Undermine Visible Chain-of-Thought Under Fixed Output Limits	揭示推理链中的“耦合税”现象：提出预算解耦策略以优化大模型推理性能	chain-of-thought
36	Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer	提出一种基于符号分解的后训练框架，实现健康基础模型嵌入空间的对齐与跨模态迁移。	foundation model
37	How Big Should a Wireless Foundation Model Be?	揭示无线基础模型的规模极限：基于物理约束的维度缩放定律与测试时训练策略	foundation model
38	Arrow: A Foundation Model for Causal Discovery	提出Arrow基础模型，通过骨架与拓扑排序分解实现零样本因果发现	foundation model
39	Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback	提出SPEAR算法，通过优势加权细化实现联邦学习环境下的在线大模型自博弈微调	foundation model	✅
40	Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders	提出Tree SAE模型，通过引入重构约束解决稀疏自编码器中特征层级结构学习的伪相关问题。	large language model
41	Tracing Uncertainty in Language Model "Reasoning"	提出基于不确定性轨迹分析的语言模型推理评估方法，实现对推理正确性的早期预测。	chain-of-thought
42	Transfer Learning Across Fast- and Full-Simulation Domains in High-Energy Physics	提出基于迁移学习的跨模拟域训练框架，显著提升高能物理任务的模拟数据利用效率。	foundation model
43	Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic	提出首度分歧交叉修补诊断法，揭示指令微调如何重塑模型上游状态与后期读取的交互机制	instruction following
44	The Convergence Gap: Instruction-Tuned Language Models Stabilize Later in the Forward Pass	提出收敛间隙（Convergence Gap）诊断方法，揭示指令微调模型在深层网络中更晚达成预测稳定性的规律	instruction following
45	The Position Curse: LLMs Struggle to Locate the Last Few Items in a List	揭示大模型“位置诅咒”现象：提出PosBench数据集并通过微调提升序列索引能力	large language model

🔬 支柱一：机器人控制 (Robot Control) (5 篇)

#	题目	一句话要点	标签	🔗
46	Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs	提出AugMP策略，利用图表示学习增强联邦微调LLM中的模型操纵攻击	manipulation representation learning geometric consistency
47	Predictive but Not Plannable: RC-aux for Latent World Models	提出可达性校正辅助目标（RC-aux），解决潜在世界模型中预测与规划不匹配的问题	reachability-aware world model worldmodel	✅
48	Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow	提出漂移场策略（DFP）：一种基于Wasserstein梯度流的单步生成式决策模型	manipulation behavior cloning
49	Fortifying Time Series: DTW-Certified Robust Anomaly Detection	提出基于随机平滑的DTW认证鲁棒时间序列异常检测方法	manipulation
50	Quotient Semivalues for False-Name-Resistant Data Attribution	提出商半值（Quotient Semivalues）机制，解决机器学习数据归因中的虚假身份操纵问题	manipulation

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
51	FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution	提出FAME框架：通过连续时间流形演化建模科学主题动态轨迹，实现学术影响力预测	spatiotemporal large language model
52	STEPS: A Temporal Smooth Error Propagation Solver on the Manifolds for Test-Time Adaptation in Time Series Forecasting	提出STEPS框架：基于时序流形上的狄利克雷边值问题求解器，实现鲁棒的时间序列测试时自适应	spatiotemporal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
53	PropSplat: Map-Free RF Field Reconstruction via 3D Gaussian Propagation Splatting	提出PropSplat：一种基于3D高斯传播溅射的无图射频场重建方法	splatting NeRF

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
54	Adaptive Domain Decomposition Physics-Informed Neural Networks for Traffic State Estimation with Sparse Sensor Data	提出自适应域分解物理信息神经网络（ADD-PINN），解决稀疏传感器下的交通状态估计平滑问题	sparse sensors

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
55	Test-Time Compositional Generalization in Diffusion Models via Concept Discovery	提出基于概念发现的测试时组合泛化方法，实现扩散模型无需预定义库的零样本组合生成。	classifier-free guidance

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-08）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (33 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

🔬 支柱一：机器人控制 (Robot Control) (5 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理