cs.LG（2026-05-26）

📊 共 45 篇论文

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (23) 支柱九：具身大模型 (Embodied Foundation Models) (16) 支柱一：机器人控制 (Robot Control) (3) 支柱四：生成式动作 (Generative Motion) (1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (23 篇)

#	题目	一句话要点	标签
1	Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice	提出双阶段适配器，保证表格基础模型在离散选择预测中的经济有效性	distillation foundation model
2	When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control	RLScale-Bench基准测试揭示：校准后的规则控制器在自适应资源控制中优于主流深度强化学习算法。	reinforcement learning deep reinforcement learning DRL
3	Adaptive Reinforcement Learning for Robust Open Quantum System Control: A Multi-Task Framework with Temporal Optimization	提出多任务SAC强化学习框架，用于鲁棒开放量子系统控制，实现时序优化。	reinforcement learning SAC PULSE
4	Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning	GraphGPO：基于图的信用分配方法，提升Agentic强化学习效率	reinforcement learning large language model
5	Recursive Flow Matching	提出递归流匹配(RecFM)，加速高精度时空动力学系统建模与预测。	flow matching spatiotemporal
6	BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning	BASIS：利用单次Rollout信息共享进行批量优势估计，提升LLM推理能力	reinforcement learning policy learning large language model
7	Causal Representation Learning for Generalisable Recommendation	提出基于因果表征学习的推荐方法，提升推荐系统在分布偏移下的泛化能力。	predictive model representation learning
8	Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation	提出Teachability-Aware OPD，通过选择可学习的token信号提升On-policy蒸馏效果。	teacher-student distillation
9	WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization	WINDQuant：基于权重信息的神经决策，用于全局混合精度LLM量化	reinforcement learning PPO large language model
10	Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders	SAERL：利用稀疏自编码器模型内部信息指导LLM后训练数据工程	reinforcement learning large language model
11	Learning Dynamic Graph Representations through Timespan View Contrasts	提出CLDG和CLDG++框架，通过时序跨度对比学习动态图表示，用于节点分类和异常检测。	contrastive learning TAMP
12	Less is More: Early Stopping Rollout for On-Policy Distillation	提出早期停止Rollout蒸馏方法，解决On-Policy蒸馏中的教师模型退化问题。	distillation
13	SQARL: A Size-Agnostic Reinforcement Learning approach for Circuit Allocation in Distributed Quantum Architectures	提出SQARL：一种规模无关的强化学习方法，用于分布式量子架构中的电路分配	reinforcement learning
14	SPHERE-JEPA: Spherical Prediction with Homogeneous Embeddings	SPHERE-JEPA：通过均匀嵌入的球面预测，提升自监督学习表征质量	JEPA
15	Generalist Graph Anomaly Detection via Prototype-Based Distillation	提出ProMoS，一种基于原型蒸馏的通用图异常检测无监督框架	distillation
16	Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts	提出R2E-IG模型，通过混合专家网络提升车辆路径问题在分布偏移下的泛化能力	reinforcement learning deep reinforcement learning DRL
17	Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training	提出Pilot-Commit框架，通过预算感知的rollout分配，加速基于群组的RL后训练。	reinforcement learning large language model
18	Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition	提出DyCo-CL框架，解决少样本自动调制识别中SSL方法的不足。	contrastive learning
19	Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards	提出Focal Reward，解决LLM中基于规则奖励的强化学习训练不平衡问题。	reinforcement learning
20	PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design	PRISM：用于多层薄膜设计的位移编码回归逆谱模型	MAE spatial relationship
21	Trust Region Q Adjoint Matching	提出Trust Region Q-Adjoint Matching，稳定优化预训练流策略的离线强化学习。	reinforcement learning offline RL
22	Ratio-Variance Regularized Policy Optimization	提出R²VPO，通过策略比率方差正则化实现稳定高效的策略优化	reinforcement learning PPO
23	Adversarial Training for Robust Coverage Network under Worst-case Facility Losses	提出双代理深度强化学习框架以解决最大覆盖位置干扰问题	reinforcement learning deep reinforcement learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

#	题目	一句话要点	标签
24	Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling	Falcon-X：用于异构多元建模的时间序列基础模型	foundation model
25	LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models	提出LUCoS，利用无监督潜在空间上下文选择提升表格数据小样本学习性能	foundation model
26	EEG-FM-Audit: A Systematic Evaluation and Analysis Pipeline for EEG Foundation Models	提出EEG-FM-Audit以解决EEG基础模型评估透明性问题	foundation model
27	Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models	针对大语言模型，研究Scale Vector的作用并提出优化策略，显著提升模型性能。	large language model
28	Particle-Lund Multimodality in Jet Taggers	提出PLuM以提升粒子喷流标记的性能	multimodal
29	Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift	提出基于TabPFN的模型以解决儿童贫血预测中的数据稀缺问题	foundation model
30	Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models	揭示基于重构的脑电基础模型对非周期性和低频成分的偏好	foundation model
31	The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery	Kalman Evolve：利用可解释算法发现弥合卡尔曼滤波的差距	large language model
32	Convergence of Spectral Descent for Non-smooth Optimization	针对非光滑优化，提出谱下降算法的收敛性分析框架	large language model
33	MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training	提出MONA：一种结合Nesterov加速的Muon优化器，用于可扩展的语言模型训练。	large language model
34	Innovation: An Almost Characterization of Hallucination	通过“创新性”刻画LLM幻觉现象，揭示校准模型固有缺陷	large language model
35	More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations	提出Token自适应激活混合(MoA)方法，提升Transformer FFN层表达能力。	large language model
36	SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?	SEC-bench Pro：评估语言模型在长程软件安全任务中的能力	large language model
37	Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks	针对开源LLM防御机制，提出基于消融和预填充的简单攻击方法	large language model
38	The Stability of Singular Distribution: A Spectral Perspective on the Two-Phase Dynamics of Language Model Pre-training	揭示大语言模型预训练两阶段动态的谱视角：奇异分布稳定性(SoSD)	large language model
39	Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training	提出Extra-Merge以优化语言模型合并过程	large language model

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签
40	Adversarial Dual On-Policy Distillation from Expressive Flow-based Teacher	提出FA-OPD对抗双重在线策略蒸馏方法，提升模仿学习在具身控制中的鲁棒性。	locomotion manipulation flow matching
41	Probabilistic Recurrent Intention Switching Model	提出概率递归意图切换模型以解决逆强化学习中的目标切换问题	manipulation reinforcement learning inverse reinforcement learning
42	Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability	提出基于预训练近似器的低推力轨道燃料消耗与可达性快速评估方法	trajectory optimization

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
43	PIDM-DP: Physics-Informed Diffusion with Dormand-Prince Integration for Chaotic System Identification and State Reconstruction across Multiple Dynamical Regimes	提出PIDM-DP，用于混沌系统识别和跨多动态范围的状态重构	physics-informed diffusion

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
44	Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences	提出stNCE框架，通过时空差异学习能量模型，提升密度估计性能	spatiotemporal

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
45	Explainable Comparison of Feature-Based and Deep Learning Models for TROPOMI Methane Plume Screening	对比特征工程与深度学习模型，用于TROPOMI甲烷羽流识别并提供可解释性分析	spatial relationship

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-26）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (23 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理