cs.LG(2026-05-08)

📊 共 55 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (33 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (12 🔗1) 支柱一:机器人控制 (Robot Control) (5 🔗1) 支柱八:物理动画 (Physics-based Animation) (2) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (33 篇)

#题目一句话要点标签🔗
1 HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents 提出HyperEyes:一种双粒度效率感知强化学习框架,实现并行多模态搜索代理 reinforcement learning distillation multimodal
2 ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression 提出ExpThink框架:通过经验引导的强化学习实现自适应思维链压缩 reinforcement learning reward shaping chain-of-thought
3 Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought 揭示思维链(CoT)在上下文强化学习(ICRL)中的收敛机制与涌现原理 reinforcement learning chain-of-thought
4 Interpreting Reinforcement Learning Agents with Susceptibilities 提出基于敏感度(Susceptibilities)的深度强化学习可解释性框架,揭示模型参数空间的演化机制。 reinforcement learning deep reinforcement learning RLHF
5 Prototype Guided Post-pretraining for Single-Cell Representation Learning 提出CellRefine后预训练框架,利用标记基因先验优化单细胞表征学习 representation learning large language model foundation model
6 Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning 提出PIQL框架:利用特权信息(PI)加速表格基础模型(TFMs)的训练并提升泛化能力 privileged information foundation model
7 Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation 提出轨迹塑形离散流匹配(TS-DFM)方法,通过能量导航蒸馏实现高效文本生成 flow matching distillation
8 Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning 揭示Softmax Transformer的ICRL机制:证明其等价于加权Softmax时序差分学习 reinforcement learning linear attention
9 Neurosymbolic Imitation Learning with Human Guidance: A Privileged Information Approach 提出基于特权信息的神经符号模仿学习框架,以提升复杂环境下的数据效率与泛化能力。 imitation learning privileged information
10 KL for a KL: On-Policy Distillation with Control Variate Baseline 提出vOPD方法:通过引入控制变量基线,解决在线策略蒸馏中的梯度方差不稳定问题。 distillation large language model
11 RelAgent: LLM Agents as Data Scientists for Relational Learning 提出RelAgent框架,利用大语言模型作为自主数据科学家解决关系型学习任务 predictive model large language model foundation model
12 SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion SHRED:通过Logit降维的自蒸馏实现免retain-set的大语言模型知识遗忘 distillation large language model
13 TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models 提出TRACE框架:利用扩散与流匹配模型实现基于传输对齐的共形预测 flow matching multimodal
14 Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning 提出Prune-OPD框架,通过动态截断与奖励加权优化长程推理任务中的在线策略蒸馏 teacher-student distillation
15 Structured Coupling for Flow Matching 提出结构化耦合流匹配(SCFM),通过联合学习结构化潜变量与连续传输映射,实现生成质量与表征可解释性的平衡。 flow matching representation learning
16 Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States 提出POISE框架:利用策略模型内部状态进行价值估计,实现高效的大语言模型强化学习 reinforcement learning PPO
17 Rubric-based On-policy Distillation 提出基于准则的在线策略蒸馏框架ROPD,实现黑盒模型的高效对齐 teacher-student distillation
18 Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control 提出Star Elastic训练框架,通过单次后训练实现嵌套子模型并支持推理阶段的动态预算控制。 SSM distillation large language model
19 Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR 提出自适应负强化学习(A-NSR)框架,通过动态惩罚策略提升LLM推理能力 reinforcement learning PPO large language model
20 Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs 提出基于价值的强化学习算法以解决指数效用优化问题 reinforcement learning
21 Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph 提出GraphDPO算法,通过偏好图建模优化语言模型对齐,解决成对偏好学习的局限性。 reinforcement learning DPO direct preference optimization
22 Debiased Counterfactual Generation via Flow Matching from Observations 提出基于流匹配的去偏反事实生成框架,通过利用观测数据分布提升反事实推断的准确性。 flow matching
23 A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning 提出针对极端多分类监督对比学习的精细化泛化分析框架,实现与类别分布无关的样本复杂度界限。 representation learning
24 StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models 提出StreamPhy框架,利用状态空间模型实现高维物理场动态的实时流式推断 state space model
25 Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models 提出互惠强化学习(MRL)框架,实现异构大语言模型间的经验共享与协同训练 reinforcement learning
26 Improved Model-based Reinforcement Learning with Smooth Kernels 提出基于平滑核的在线强化学习方法,通过Bernstein风格探索奖励优化遗憾界 reinforcement learning
27 Coupling Models for One-Step Discrete Generation 提出耦合模型(Coupling Models)以实现离散数据的高效一步生成 distillation large language model
28 Stabilized neural Hamilton--Jacobi--Bellman solvers: Error analysis and applications in model-based reinforcement learning 提出稳定化的神经Hamilton-Jacobi-Bellman求解器,用于模型强化学习。 reinforcement learning
29 Where to Spend Rollouts: Hit-Utility Optimal Rollout Allocation for Group-Based RLVR 提出HORA算法:通过命中效用最优分配策略提升基于群组的RLVR推理效率 reinforcement learning large language model
30 Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift 通过Poisson-Moreau漂移,提升随机逼近和强化学习的几乎必然收敛速度 reinforcement learning
31 Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective 提出累积令牌策略优化(CTPO),通过累积重要性采样比解决LLM强化学习中的偏差-方差困境。 reinforcement learning PPO
32 Theoretical Limits of Language Model Alignment 提出KL正则化的语言模型对齐理论极限以优化对齐效果 reinforcement learning PPO
33 Actor-Critic with Active Importance Sampling 提出主动重要性采样Actor-Critic(AISAC)算法,通过优化行为策略显著降低梯度估计方差。 reinforcement learning TD3

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
34 Pre-trained Tabular Foundation Models as Versatile Summary Networks for Neural Posterior Estimation 提出PFN-NPE框架,利用预训练表格基础模型作为通用摘要网络实现神经后验估计 foundation model
35 The Coupling Tax: How Shared Token Budgets Undermine Visible Chain-of-Thought Under Fixed Output Limits 揭示推理链中的“耦合税”现象:提出预算解耦策略以优化大模型推理性能 chain-of-thought
36 Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer 提出一种基于符号分解的后训练框架,实现健康基础模型嵌入空间的对齐与跨模态迁移。 foundation model
37 How Big Should a Wireless Foundation Model Be? 揭示无线基础模型的规模极限:基于物理约束的维度缩放定律与测试时训练策略 foundation model
38 Arrow: A Foundation Model for Causal Discovery 提出Arrow基础模型,通过骨架与拓扑排序分解实现零样本因果发现 foundation model
39 Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback 提出SPEAR算法,通过优势加权细化实现联邦学习环境下的在线大模型自博弈微调 foundation model
40 Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders 提出Tree SAE模型,通过引入重构约束解决稀疏自编码器中特征层级结构学习的伪相关问题。 large language model
41 Tracing Uncertainty in Language Model "Reasoning" 提出基于不确定性轨迹分析的语言模型推理评估方法,实现对推理正确性的早期预测。 chain-of-thought
42 Transfer Learning Across Fast- and Full-Simulation Domains in High-Energy Physics 提出基于迁移学习的跨模拟域训练框架,显著提升高能物理任务的模拟数据利用效率。 foundation model
43 Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic 提出首度分歧交叉修补诊断法,揭示指令微调如何重塑模型上游状态与后期读取的交互机制 instruction following
44 The Convergence Gap: Instruction-Tuned Language Models Stabilize Later in the Forward Pass 提出收敛间隙(Convergence Gap)诊断方法,揭示指令微调模型在深层网络中更晚达成预测稳定性的规律 instruction following
45 The Position Curse: LLMs Struggle to Locate the Last Few Items in a List 揭示大模型“位置诅咒”现象:提出PosBench数据集并通过微调提升序列索引能力 large language model

🔬 支柱一:机器人控制 (Robot Control) (5 篇)

#题目一句话要点标签🔗
46 Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs 提出AugMP策略,利用图表示学习增强联邦微调LLM中的模型操纵攻击 manipulation representation learning geometric consistency
47 Predictive but Not Plannable: RC-aux for Latent World Models 提出可达性校正辅助目标(RC-aux),解决潜在世界模型中预测与规划不匹配的问题 reachability-aware world model worldmodel
48 Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow 提出漂移场策略(DFP):一种基于Wasserstein梯度流的单步生成式决策模型 manipulation behavior cloning
49 Fortifying Time Series: DTW-Certified Robust Anomaly Detection 提出基于随机平滑的DTW认证鲁棒时间序列异常检测方法 manipulation
50 Quotient Semivalues for False-Name-Resistant Data Attribution 提出商半值(Quotient Semivalues)机制,解决机器学习数据归因中的虚假身份操纵问题 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
51 FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution 提出FAME框架:通过连续时间流形演化建模科学主题动态轨迹,实现学术影响力预测 spatiotemporal large language model
52 STEPS: A Temporal Smooth Error Propagation Solver on the Manifolds for Test-Time Adaptation in Time Series Forecasting 提出STEPS框架:基于时序流形上的狄利克雷边值问题求解器,实现鲁棒的时间序列测试时自适应 spatiotemporal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
53 PropSplat: Map-Free RF Field Reconstruction via 3D Gaussian Propagation Splatting 提出PropSplat:一种基于3D高斯传播溅射的无图射频场重建方法 splatting NeRF

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
54 Adaptive Domain Decomposition Physics-Informed Neural Networks for Traffic State Estimation with Sparse Sensor Data 提出自适应域分解物理信息神经网络(ADD-PINN),解决稀疏传感器下的交通状态估计平滑问题 sparse sensors

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
55 Test-Time Compositional Generalization in Diffusion Models via Concept Discovery 提出基于概念发现的测试时组合泛化方法,实现扩散模型无需预定义库的零样本组合生成。 classifier-free guidance

⬅️ 返回 cs.LG 首页 · 🏠 返回主页