cs.LG（2026-05-21）

📊 共 49 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (23 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (16 🔗3) 支柱一：机器人控制 (Robot Control) (4 🔗1) 支柱四：生成式动作 (Generative Motion) (3 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (23 篇)

#	题目	一句话要点	标签	🔗
1	ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data	提出ChronoMedicalWorld模型以解决长期临床数据中的患者轨迹预测问题	world model world models MAE
2	Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning	提出目标对齐的贝尔曼备份(TABB)方法，解决跨域离线强化学习中的数据迁移问题。	reinforcement learning policy learning offline RL
3	Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles	Maestro：强化学习驱动的分层模型-技能集成框架，提升多模态任务性能	reinforcement learning large language model multimodal	✅
4	Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation	基于状态分布视角分析SFT、RL和On-Policy蒸馏的后训练方法	reinforcement learning distillation large language model
5	From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning	SCRL：基于子问题课程学习的强化学习，提升LLM推理能力并解决信用分配问题	reinforcement learning curriculum learning IMoS
6	Asymmetric Virtual Memory Paging for Hybrid Mamba-Transformer Inference	提出非对称虚拟内存分页AVMP，优化混合Mamba-Transformer模型推理的内存管理。	Mamba SSM state space model
7	The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning	提出匹配原则，通过估计扰动协方差正则化编码器，实现表征学习的鲁棒性。	DPO representation learning
8	From Snapshots to Trajectories: Learning Single-Cell Gene Expression Dynamics via Conditional Flow Matching	提出scFM，通过条件流匹配学习单细胞基因表达动态，解决时间序列数据缺失问题。	flow matching latent dynamics
9	Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks	提出基于切比雪夫多项式的强化学习策略，显著提升低维控制任务性能	reinforcement learning PPO
10	Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning	提出方向自适应自蒸馏(DASD)，提升LLM在数学推理中的探索能力与准确性	distillation privileged information
11	Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration	提出基于情景上下文和持久世界的3D探索方法，解决好奇心驱动探索中的局部循环问题。	reinforcement learning predictive model 3D reconstruction	✅
12	MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data	MambaGaze：利用双向Mamba和显式缺失数据建模进行认知负荷评估	Mamba
13	The Distillation Game: Adaptive Attacks & Efficient Defenses	提出基于对抗博弈的蒸馏攻击与防御框架，并设计高效防御方法PoE。	distillation	✅
14	Abstraction for Offline Goal-Conditioned Reinforcement Learning	针对离线目标条件强化学习，提出基于相对化选项和层级抽象的框架	reinforcement learning
15	Reinforcement learning for ion shuttling on trapped-ion quantum computers	提出基于强化学习的离子穿梭优化方法，提升囚禁离子量子计算机的运算效率。	reinforcement learning
16	Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning	提出Qreg+NWLU以解决多循环持续强化学习中的遗忘问题	reinforcement learning
17	Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs	提出RGoT：利用强化学习自适应生成LLM的思维图，提升复杂问题求解能力	reinforcement learning large language model
18	One-Way Policy Optimization for Self-Evolving LLMs	提出单向策略优化以解决大语言模型训练不稳定问题	reinforcement learning large language model
19	Toward Understanding Adversarial Distillation: Why Robust Teachers Fail	揭示对抗蒸馏中鲁棒教师失效的原因：鲁棒不可学习集上的不一致性	distillation
20	PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference	PhylaFlow：在BHV树空间中利用混合流匹配进行系统发育推断	flow matching
21	Hybrid Kolmogorov-Arnold Network and XGBoost Framework for Week-Ahead Price Forecasting in Australia's National Electricity Market	提出KAN+XGBoost混合框架，用于澳大利亚电力市场中长期电力价格预测。	MAE penetration
22	Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL	揭示自博弈强化学习中数据门控与奖励函数的不对称性，强调数据门控对稳定性的关键作用。	reinforcement learning reward design
23	OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning	OPPO：基于贝叶斯值递归的LLM推理中Token级信用分配方法	reinforcement learning distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

#	题目	一句话要点	标签	🔗
24	Understanding Multimodal Failure in Action-Chunking Behavioral Cloning	研究动作分块行为克隆中的多模态失败问题，揭示不同参数化方法的局限性。	multimodal
25	CogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead Adaptation	CogAdapt：通过导联自适应将临床心电图基础模型迁移到可穿戴认知负荷评估	foundation model
26	ChronoVAE-HOPE: Beyond Attention -- A Next-Generation VAE Foundation Model for Specialized Time Series Classification	ChronoVAE-HOPE：面向时间序列分类的新一代VAE基础模型，超越注意力机制。	foundation model
27	ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models	提出ARC-STAR框架，用于偏微分方程基础模型的后验可审计校正，显著提升预测精度。	foundation model
28	Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference	提出基于大语言模型的恒星光谱分析框架，实现高效准确的恒星参数和丰度推断。	large language model
29	The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation	提出Zero-CoT Probe，通过截断CoT推理暴露LLM中的数据污染问题	large language model chain-of-thought	✅
30	FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection	提出FAME：一种面向消息级日志异常检测的故障感知混合专家模型。	large language model
31	GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving	GraphFlow：基于图的工作流管理，提升LLM-Agent服务效率	large language model
32	The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning	提出神经编译器，将程序转换为可微模块，实现混合科学机器学习。	large language model
33	AMUSE: Anytime Muon with Stable Gradient Evaluation	提出AMUSE优化器，结合Muon加速和Schedule-Free稳定，提升深度学习模型训练效率。	large language model
34	Boundary-targeted Membership Inference Attacks on Safety Classifiers	提出边界导向的成员推理攻击，揭示安全分类器在敏感数据上的隐私风险。	large language model
35	VeriScale: Adversarial Test-Suite Scaling for Verifiable Code Generation	VeriScale：面向可验证代码生成，通过对抗测试用例缩放提升基准质量	large language model	✅
36	One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs	提出重尾分布引导的层级学习率调整方法，提升LLM训练效率与泛化能力	large language model	✅
37	What are the Right Symmetries for Formal Theorem Proving?	提出重写范畴框架，提升LLM在形式化定理证明中的对称性与鲁棒性	large language model
38	LABO: LLM-Accelerated Bayesian Optimization through Broad Exploration and Selective Experimentation	LABO：通过LLM加速贝叶斯优化，实现广泛探索和选择性实验	large language model
39	Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)	通过 lm_head 权重矩阵的奇异值分解揭示LLM的隐藏语义和潜在问题	large language model

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

#	题目	一句话要点	标签	🔗
40	Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics	提出轨迹可达性度量(TRM)以修正潜在世界模型中的规划偏差	manipulation MPC world model
41	SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization	提出SCI-Defense框架，防御LLM排序系统中的生成引擎优化攻击	manipulation
42	MoSA: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap in Continuum Dynamics via Learning Residual Anisotropy	MoSA：通过学习残余各向异性，缓解连续体动力学中实物到模拟的差距	manipulation sim-to-real	✅
43	Do Not Trust The Auctioneer: Learning to Bid in Feedback-Manipulated Auctions	提出一种新算法以应对反馈操控拍卖中的竞标问题	manipulation

🔬 支柱四：生成式动作 (Generative Motion) (3 篇)

#	题目	一句话要点	标签	🔗
44	Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation	改进均匀扩散模型：提出Leave-One-Out去噪器和吸收态重构	MDM	✅
45	A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models	从微分方程出发提出扩散模型的系统性教程	classifier-free guidance
46	Generative Modeling by Value-Driven Transport	提出基于值驱动传输的生成模型，解决传统生成模型在路径模拟上的挑战。	classifier-free guidance

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
47	Physics-Informed Generative Solver: Bridging Data-Driven Priors and Conservation Laws for Stable Spatiotemporal Field Reconstruction	提出物理信息生成求解器，结合数据驱动先验与守恒定律，实现稳定时空场重建	sparse sensors spatiotemporal

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
48	Departure from Regularity: Degree Heterogeneity and Eigengap as the Structural Drivers of ASE-LSE Latent Subspace Disagreement	揭示图结构异质性与特征值间隙对谱嵌入差异的影响	ASE

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
49	Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference	提出决策感知的二次ReLU替代方法，加速同态加密友好的神经网络推理。	OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-21）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (23 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

🔬 支柱四：生成式动作 (Generative Motion) (3 篇)

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理