cs.LG（2026-05-12）

📊 共 58 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (31 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (22 🔗5) 支柱一：机器人控制 (Robot Control) (3) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (31 篇)

#	题目	一句话要点	标签	🔗
1	PriorZero: Bridging Language Priors and World Models for Decision Making	提出PriorZero以解决LLM与RL之间的动态不匹配问题	reinforcement learning world model world models	✅
2	Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models	针对扩散大语言模型多领域强化学习，提出Block-R1以解决领域块大小冲突问题。	reinforcement learning large language model	✅
3	ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models	ORCE：提出一种顺序感知的大语言模型置信度校准框架，提升可靠性。	reinforcement learning large language model
4	Discrete Flow Matching for Offline-to-Online Reinforcement Learning	DRIFT：用于离线到在线强化学习的离散流匹配方法	reinforcement learning flow matching
5	Intrinsic Vicarious Conditioning for Deep Reinforcement Learning	提出基于内在替代性条件反射的深度强化学习方法，解决单生命周期和持续学习问题	reinforcement learning deep reinforcement learning
6	MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification	MaskTab：面向工业分类的可扩展掩码表格预训练，结合缩放法则与知识蒸馏	distillation foundation model
7	On the Importance of Multistability for Horizon Generalization in Reinforcement Learning	提出时间horizon泛化理论框架，揭示多稳态对强化学习长期记忆的重要性	reinforcement learning state space model
8	Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction	针对异步Agent强化学习中缺失旧Logits问题，提出语义解耦的修正方法。	reinforcement learning PPO large language model	✅
9	GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation	提出GEAR框架，通过自蒸馏实现LLM Agent的细粒度自适应优势重加权，提升长程任务性能。	reinforcement learning distillation
10	Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training	提出稀疏到稠密奖励原则，提升语言模型在可验证数学问题上的后训练效果	distillation
11	Model-based Bootstrap of Controlled Markov Chains	提出基于模型的Bootstrap方法，用于控制马尔可夫链的离线策略评估与优化。	reinforcement learning offline RL offline reinforcement learning
12	OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning	提出OGLS-SD，通过结果引导的Logit调整实现LLM推理的On-Policy自蒸馏。	distillation
13	Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning	提出事件驱动框架以解决多智能体强化学习中的行为多样性问题	reinforcement learning
14	Transferable Delay-Aware Reinforcement Learning via Implicit Causal Graph Modeling	提出基于隐式因果图建模的可迁移延迟感知强化学习方法	reinforcement learning
15	Delay-Empowered Causal Hierarchical Reinforcement Learning	提出延迟增强因果分层强化学习(DECHRL)，解决时延不确定性下的决策问题	reinforcement learning
16	Optimal Policy Learning under Budget and Coverage Constraints	提出预算与覆盖约束下的最优策略学习方法	policy learning
17	Multi-Task Representation Learning for Conservative Linear Bandits	提出CMTRL框架，解决保守线性Bandit中的多任务表示学习问题	representation learning
18	Expected Batch Optimal Transport Plans and Consequences for Flow Matching	提出期望批量最优传输计划以解决流匹配问题	flow matching
19	Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning	提出基于RAPC的强化学习方法，解决随机环境下概率可达-避障约束下的成本优化问题	reinforcement learning
20	Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization	提出DGAO，通过双重群组优势优化缓解大语言模型的顺序敏感性问题。	reinforcement learning large language model	✅
21	Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning	提出自适应TD($λ$)算法ATD($λ$)，解决MARL中策略分布难以计算的问题	reinforcement learning
22	Information theoretic underpinning of self-supervised learning by clustering	通过聚类进行自监督学习的信息理论基础研究	distillation foundation model
23	GRAFT: Graph-Tokenized LLMs for Tool Planning	GRAFT：图结构Token化LLM用于工具规划，解决依赖关系建模难题	distillation large language model
24	Evolutionary Task Discovery: Advancing Reasoning Frontiers via Skill Composition and Complexity Scaling	EvoTD：通过技能组合与复杂度缩放，提升大语言模型的推理能力	reinforcement learning large language model	✅
25	From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation	提出CREDIT，通过对比学习提升On-Policy自蒸馏的输入特异性奖励。	distillation
26	Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information	提出AntiSD，通过反向自蒸馏提升语言模型在数学推理中的能力。	distillation
27	Sharpen Your Flow: Sharpness-Aware Sampling for Flow Matching	提出SharpEuler：一种Flow Matching的自适应采样方法，提升生成质量。	flow matching
28	BSO: Safety Alignment Is Density Ratio Matching	提出BSO以简化安全对齐问题的解决方案	reinforcement learning direct preference optimization
29	Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds	提出联合KL的自回归学习以解决长序列建模问题	policy learning imitation learning
30	Variance-aware Reward Modeling with Anchor Guidance	提出Anchor引导的方差感知奖励建模，解决人类偏好多样性下的奖励模型非唯一性问题。	PPO RLHF
31	OUI as a Structural Observable: Towards an Activation-Centric View of Neural Network Training	提出OUI作为神经网络训练结构可观测指标，揭示激活函数中心视角下的训练动态	reinforcement learning PPO

🔬 支柱九：具身大模型 (Embodied Foundation Models) (22 篇)

#	题目	一句话要点	标签	🔗
32	Instruction Lens Score: Your Instruction Contributes a Powerful Object Hallucination Detector for Multimodal Large Language Models	提出Instruction Lens Score，用于多模态大语言模型中的物体幻觉检测。	large language model multimodal	✅
33	U-STS-LLM A Unified Spatio-Temporal Steered Large Language Model for Traffic Prediction and Imputation	提出U-STS-LLM，用于统一解决时空交通预测和补全问题。	large language model foundation model
34	Resilient Vision-Tabular Multimodal Learning under Modality Missingness	提出一种鲁棒的多模态Transformer框架，解决医学图像和表格数据中模态缺失问题。	multimodal
35	Grid Games: The Power of Multiple Grids for Quantizing Large Language Models	提出多网格量化方法，显著提升大语言模型微尺度4比特量化精度	large language model	✅
36	STAGE: Tackling Semantic Drift in Multimodal Federated Graph Learning	提出STAGE框架，解决多模态联邦图学习中的语义漂移问题。	multimodal
37	Beyond Parameter Aggregation: Semantic Consensus for Federated Fine-Tuning of LLMs	提出基于语义共识的联邦LLM微调方法，大幅降低通信成本。	large language model foundation model
38	Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation	Pion：一种基于正交等价变换的保谱优化器，用于大语言模型训练。	large language model
39	Learning, Fast and Slow: Towards LLMs That Adapt Continually	提出快速-慢速学习框架，提升LLM持续学习能力并减少灾难性遗忘。	large language model
40	Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs	提出多流LLM，通过并行处理思想、输入和输出流来突破语言模型的瓶颈	chain-of-thought
41	Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling	提出基于文本表格建模的AI Agent决策预测方法，解决有限交互下的决策推断问题。	foundation model
42	SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization	SOAR：面向NVFP4量化的尺度优化，实现更精确的模型重建	large language model	✅
43	Investigating simple target-covariate relationships for Chronos-2 and TabPFN-TS	评估时间序列基础模型对目标-协变量关系的建模能力	foundation model
44	A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning	提出UniGraphLM，用于多领域多任务图对齐指令调优，提升图语言模型泛化能力。	large language model
45	Hölder Policy Optimisation	提出HölderPO以解决GRPO聚合机制适应性不足问题	large language model
46	Efficient and Adaptive Human Activity Recognition via LLM Backbones	利用LLM骨干网络实现高效自适应的人体活动识别	foundation model
47	Procedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Models	提出W形预训练轨迹以优化Qwen3.5模型的程序技能SFT	chain-of-thought
48	More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing	提出StableEdit，通过强化稳定循环解决序列模型编辑中的灾难性遗忘问题。	large language model	✅
49	ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems	ROMER：面向模拟存内计算MoE LLM的专家替换与路由校准	large language model
50	Compositional Neural Operators for Multi-Dimensional Fluid Dynamics	提出CompNO，通过组合神经算子解决多维流体动力学问题，提升泛化性和可解释性。	foundation model
51	Slicing and Dicing: Configuring Optimal Mixtures of Experts	系统性研究MoE架构配置，揭示专家数量和粒度对性能的关键影响	large language model
52	EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting	提出EpiCastBench以解决多变量流行病预测基准缺乏问题	foundation model	✅
53	Fast MoE Inference via Predictive Prefetching and Expert Replication	提出基于预测预取和专家复制的快速MoE推理方法，提升GPU利用率	large language model

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签
54	Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies	提出行为模式发现框架，用于微调多模态生成策略，提升机器人操作任务性能。	manipulation reinforcement learning diffusion policy
55	Aligning Flow Map Policies with Optimal Q-Guidance	提出Flow Map策略，通过Q-引导加速离线到在线强化学习。	locomotion manipulation reinforcement learning
56	In-context learning to predict critical transitions in dynamical systems	提出TipPFN框架以预测动态系统中的关键转折点	sim-to-real

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
57	NOFE -- Neural Operator Function Embedding	提出神经算子函数嵌入（NOFE），用于连续域上的降维，提升局部结构保持能力。	structure preservation

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
58	Finite Sentence-Interface Control for Learning Bounded-Fan-Out Linear MCFGs under Fixed Monoid Typing	提出有限句子接口控制以学习有界扇出线性多重上下文无关文法	OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-12）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (31 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (22 篇)

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理