cs.LG(2026-05-12)

📊 共 58 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (31 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (22 🔗5) 支柱一:机器人控制 (Robot Control) (3) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (31 篇)

#题目一句话要点标签🔗
1 PriorZero: Bridging Language Priors and World Models for Decision Making 提出PriorZero以解决LLM与RL之间的动态不匹配问题 reinforcement learning world model world models
2 Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models 针对扩散大语言模型多领域强化学习,提出Block-R1以解决领域块大小冲突问题。 reinforcement learning large language model
3 ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models ORCE:提出一种顺序感知的大语言模型置信度校准框架,提升可靠性。 reinforcement learning large language model
4 Discrete Flow Matching for Offline-to-Online Reinforcement Learning DRIFT:用于离线到在线强化学习的离散流匹配方法 reinforcement learning flow matching
5 Intrinsic Vicarious Conditioning for Deep Reinforcement Learning 提出基于内在替代性条件反射的深度强化学习方法,解决单生命周期和持续学习问题 reinforcement learning deep reinforcement learning
6 MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification MaskTab:面向工业分类的可扩展掩码表格预训练,结合缩放法则与知识蒸馏 distillation foundation model
7 On the Importance of Multistability for Horizon Generalization in Reinforcement Learning 提出时间horizon泛化理论框架,揭示多稳态对强化学习长期记忆的重要性 reinforcement learning state space model
8 Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction 针对异步Agent强化学习中缺失旧Logits问题,提出语义解耦的修正方法。 reinforcement learning PPO large language model
9 GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation 提出GEAR框架,通过自蒸馏实现LLM Agent的细粒度自适应优势重加权,提升长程任务性能。 reinforcement learning distillation
10 Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training 提出稀疏到稠密奖励原则,提升语言模型在可验证数学问题上的后训练效果 distillation
11 Model-based Bootstrap of Controlled Markov Chains 提出基于模型的Bootstrap方法,用于控制马尔可夫链的离线策略评估与优化。 reinforcement learning offline RL offline reinforcement learning
12 OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning 提出OGLS-SD,通过结果引导的Logit调整实现LLM推理的On-Policy自蒸馏。 distillation
13 Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning 提出事件驱动框架以解决多智能体强化学习中的行为多样性问题 reinforcement learning
14 Transferable Delay-Aware Reinforcement Learning via Implicit Causal Graph Modeling 提出基于隐式因果图建模的可迁移延迟感知强化学习方法 reinforcement learning
15 Delay-Empowered Causal Hierarchical Reinforcement Learning 提出延迟增强因果分层强化学习(DECHRL),解决时延不确定性下的决策问题 reinforcement learning
16 Optimal Policy Learning under Budget and Coverage Constraints 提出预算与覆盖约束下的最优策略学习方法 policy learning
17 Multi-Task Representation Learning for Conservative Linear Bandits 提出CMTRL框架,解决保守线性Bandit中的多任务表示学习问题 representation learning
18 Expected Batch Optimal Transport Plans and Consequences for Flow Matching 提出期望批量最优传输计划以解决流匹配问题 flow matching
19 Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning 提出基于RAPC的强化学习方法,解决随机环境下概率可达-避障约束下的成本优化问题 reinforcement learning
20 Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization 提出DGAO,通过双重群组优势优化缓解大语言模型的顺序敏感性问题。 reinforcement learning large language model
21 Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning 提出自适应TD($λ$)算法ATD($λ$),解决MARL中策略分布难以计算的问题 reinforcement learning
22 Information theoretic underpinning of self-supervised learning by clustering 通过聚类进行自监督学习的信息理论基础研究 distillation foundation model
23 GRAFT: Graph-Tokenized LLMs for Tool Planning GRAFT:图结构Token化LLM用于工具规划,解决依赖关系建模难题 distillation large language model
24 Evolutionary Task Discovery: Advancing Reasoning Frontiers via Skill Composition and Complexity Scaling EvoTD:通过技能组合与复杂度缩放,提升大语言模型的推理能力 reinforcement learning large language model
25 From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation 提出CREDIT,通过对比学习提升On-Policy自蒸馏的输入特异性奖励。 distillation
26 Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information 提出AntiSD,通过反向自蒸馏提升语言模型在数学推理中的能力。 distillation
27 Sharpen Your Flow: Sharpness-Aware Sampling for Flow Matching 提出SharpEuler:一种Flow Matching的自适应采样方法,提升生成质量。 flow matching
28 BSO: Safety Alignment Is Density Ratio Matching 提出BSO以简化安全对齐问题的解决方案 reinforcement learning direct preference optimization
29 Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds 提出联合KL的自回归学习以解决长序列建模问题 policy learning imitation learning
30 Variance-aware Reward Modeling with Anchor Guidance 提出Anchor引导的方差感知奖励建模,解决人类偏好多样性下的奖励模型非唯一性问题。 PPO RLHF
31 OUI as a Structural Observable: Towards an Activation-Centric View of Neural Network Training 提出OUI作为神经网络训练结构可观测指标,揭示激活函数中心视角下的训练动态 reinforcement learning PPO

🔬 支柱九:具身大模型 (Embodied Foundation Models) (22 篇)

#题目一句话要点标签🔗
32 Instruction Lens Score: Your Instruction Contributes a Powerful Object Hallucination Detector for Multimodal Large Language Models 提出Instruction Lens Score,用于多模态大语言模型中的物体幻觉检测。 large language model multimodal
33 U-STS-LLM A Unified Spatio-Temporal Steered Large Language Model for Traffic Prediction and Imputation 提出U-STS-LLM,用于统一解决时空交通预测和补全问题。 large language model foundation model
34 Resilient Vision-Tabular Multimodal Learning under Modality Missingness 提出一种鲁棒的多模态Transformer框架,解决医学图像和表格数据中模态缺失问题。 multimodal
35 Grid Games: The Power of Multiple Grids for Quantizing Large Language Models 提出多网格量化方法,显著提升大语言模型微尺度4比特量化精度 large language model
36 STAGE: Tackling Semantic Drift in Multimodal Federated Graph Learning 提出STAGE框架,解决多模态联邦图学习中的语义漂移问题。 multimodal
37 Beyond Parameter Aggregation: Semantic Consensus for Federated Fine-Tuning of LLMs 提出基于语义共识的联邦LLM微调方法,大幅降低通信成本。 large language model foundation model
38 Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation Pion:一种基于正交等价变换的保谱优化器,用于大语言模型训练。 large language model
39 Learning, Fast and Slow: Towards LLMs That Adapt Continually 提出快速-慢速学习框架,提升LLM持续学习能力并减少灾难性遗忘。 large language model
40 Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs 提出多流LLM,通过并行处理思想、输入和输出流来突破语言模型的瓶颈 chain-of-thought
41 Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling 提出基于文本表格建模的AI Agent决策预测方法,解决有限交互下的决策推断问题。 foundation model
42 SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization SOAR:面向NVFP4量化的尺度优化,实现更精确的模型重建 large language model
43 Investigating simple target-covariate relationships for Chronos-2 and TabPFN-TS 评估时间序列基础模型对目标-协变量关系的建模能力 foundation model
44 A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning 提出UniGraphLM,用于多领域多任务图对齐指令调优,提升图语言模型泛化能力。 large language model
45 Hölder Policy Optimisation 提出HölderPO以解决GRPO聚合机制适应性不足问题 large language model
46 Efficient and Adaptive Human Activity Recognition via LLM Backbones 利用LLM骨干网络实现高效自适应的人体活动识别 foundation model
47 Procedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Models 提出W形预训练轨迹以优化Qwen3.5模型的程序技能SFT chain-of-thought
48 More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing 提出StableEdit,通过强化稳定循环解决序列模型编辑中的灾难性遗忘问题。 large language model
49 ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems ROMER:面向模拟存内计算MoE LLM的专家替换与路由校准 large language model
50 Compositional Neural Operators for Multi-Dimensional Fluid Dynamics 提出CompNO,通过组合神经算子解决多维流体动力学问题,提升泛化性和可解释性。 foundation model
51 Slicing and Dicing: Configuring Optimal Mixtures of Experts 系统性研究MoE架构配置,揭示专家数量和粒度对性能的关键影响 large language model
52 EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting 提出EpiCastBench以解决多变量流行病预测基准缺乏问题 foundation model
53 Fast MoE Inference via Predictive Prefetching and Expert Replication 提出基于预测预取和专家复制的快速MoE推理方法,提升GPU利用率 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
54 Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies 提出行为模式发现框架,用于微调多模态生成策略,提升机器人操作任务性能。 manipulation reinforcement learning diffusion policy
55 Aligning Flow Map Policies with Optimal Q-Guidance 提出Flow Map策略,通过Q-引导加速离线到在线强化学习。 locomotion manipulation reinforcement learning
56 In-context learning to predict critical transitions in dynamical systems 提出TipPFN框架以预测动态系统中的关键转折点 sim-to-real

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
57 NOFE -- Neural Operator Function Embedding 提出神经算子函数嵌入(NOFE),用于连续域上的降维,提升局部结构保持能力。 structure preservation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
58 Finite Sentence-Interface Control for Learning Bounded-Fan-Out Linear MCFGs under Fixed Monoid Typing 提出有限句子接口控制以学习有界扇出线性多重上下文无关文法 OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页