cs.LG(2026-05-21)

📊 共 49 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (23 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (16 🔗3) 支柱一:机器人控制 (Robot Control) (4 🔗1) 支柱四:生成式动作 (Generative Motion) (3 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (23 篇)

#题目一句话要点标签🔗
1 ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data 提出ChronoMedicalWorld模型以解决长期临床数据中的患者轨迹预测问题 world model world models MAE
2 Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning 提出目标对齐的贝尔曼备份(TABB)方法,解决跨域离线强化学习中的数据迁移问题。 reinforcement learning policy learning offline RL
3 Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles Maestro:强化学习驱动的分层模型-技能集成框架,提升多模态任务性能 reinforcement learning large language model multimodal
4 Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation 基于状态分布视角分析SFT、RL和On-Policy蒸馏的后训练方法 reinforcement learning distillation large language model
5 From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning SCRL:基于子问题课程学习的强化学习,提升LLM推理能力并解决信用分配问题 reinforcement learning curriculum learning IMoS
6 Asymmetric Virtual Memory Paging for Hybrid Mamba-Transformer Inference 提出非对称虚拟内存分页AVMP,优化混合Mamba-Transformer模型推理的内存管理。 Mamba SSM state space model
7 The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning 提出匹配原则,通过估计扰动协方差正则化编码器,实现表征学习的鲁棒性。 DPO representation learning
8 From Snapshots to Trajectories: Learning Single-Cell Gene Expression Dynamics via Conditional Flow Matching 提出scFM,通过条件流匹配学习单细胞基因表达动态,解决时间序列数据缺失问题。 flow matching latent dynamics
9 Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks 提出基于切比雪夫多项式的强化学习策略,显著提升低维控制任务性能 reinforcement learning PPO
10 Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning 提出方向自适应自蒸馏(DASD),提升LLM在数学推理中的探索能力与准确性 distillation privileged information
11 Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration 提出基于情景上下文和持久世界的3D探索方法,解决好奇心驱动探索中的局部循环问题。 reinforcement learning predictive model 3D reconstruction
12 MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data MambaGaze:利用双向Mamba和显式缺失数据建模进行认知负荷评估 Mamba
13 The Distillation Game: Adaptive Attacks & Efficient Defenses 提出基于对抗博弈的蒸馏攻击与防御框架,并设计高效防御方法PoE。 distillation
14 Abstraction for Offline Goal-Conditioned Reinforcement Learning 针对离线目标条件强化学习,提出基于相对化选项和层级抽象的框架 reinforcement learning
15 Reinforcement learning for ion shuttling on trapped-ion quantum computers 提出基于强化学习的离子穿梭优化方法,提升囚禁离子量子计算机的运算效率。 reinforcement learning
16 Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning 提出Qreg+NWLU以解决多循环持续强化学习中的遗忘问题 reinforcement learning
17 Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs 提出RGoT:利用强化学习自适应生成LLM的思维图,提升复杂问题求解能力 reinforcement learning large language model
18 One-Way Policy Optimization for Self-Evolving LLMs 提出单向策略优化以解决大语言模型训练不稳定问题 reinforcement learning large language model
19 Toward Understanding Adversarial Distillation: Why Robust Teachers Fail 揭示对抗蒸馏中鲁棒教师失效的原因:鲁棒不可学习集上的不一致性 distillation
20 PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference PhylaFlow:在BHV树空间中利用混合流匹配进行系统发育推断 flow matching
21 Hybrid Kolmogorov-Arnold Network and XGBoost Framework for Week-Ahead Price Forecasting in Australia's National Electricity Market 提出KAN+XGBoost混合框架,用于澳大利亚电力市场中长期电力价格预测。 MAE penetration
22 Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL 揭示自博弈强化学习中数据门控与奖励函数的不对称性,强调数据门控对稳定性的关键作用。 reinforcement learning reward design
23 OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning OPPO:基于贝叶斯值递归的LLM推理中Token级信用分配方法 reinforcement learning distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)

#题目一句话要点标签🔗
24 Understanding Multimodal Failure in Action-Chunking Behavioral Cloning 研究动作分块行为克隆中的多模态失败问题,揭示不同参数化方法的局限性。 multimodal
25 CogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead Adaptation CogAdapt:通过导联自适应将临床心电图基础模型迁移到可穿戴认知负荷评估 foundation model
26 ChronoVAE-HOPE: Beyond Attention -- A Next-Generation VAE Foundation Model for Specialized Time Series Classification ChronoVAE-HOPE:面向时间序列分类的新一代VAE基础模型,超越注意力机制。 foundation model
27 ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models 提出ARC-STAR框架,用于偏微分方程基础模型的后验可审计校正,显著提升预测精度。 foundation model
28 Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference 提出基于大语言模型的恒星光谱分析框架,实现高效准确的恒星参数和丰度推断。 large language model
29 The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation 提出Zero-CoT Probe,通过截断CoT推理暴露LLM中的数据污染问题 large language model chain-of-thought
30 FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection 提出FAME:一种面向消息级日志异常检测的故障感知混合专家模型。 large language model
31 GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving GraphFlow:基于图的工作流管理,提升LLM-Agent服务效率 large language model
32 The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning 提出神经编译器,将程序转换为可微模块,实现混合科学机器学习。 large language model
33 AMUSE: Anytime Muon with Stable Gradient Evaluation 提出AMUSE优化器,结合Muon加速和Schedule-Free稳定,提升深度学习模型训练效率。 large language model
34 Boundary-targeted Membership Inference Attacks on Safety Classifiers 提出边界导向的成员推理攻击,揭示安全分类器在敏感数据上的隐私风险。 large language model
35 VeriScale: Adversarial Test-Suite Scaling for Verifiable Code Generation VeriScale:面向可验证代码生成,通过对抗测试用例缩放提升基准质量 large language model
36 One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs 提出重尾分布引导的层级学习率调整方法,提升LLM训练效率与泛化能力 large language model
37 What are the Right Symmetries for Formal Theorem Proving? 提出重写范畴框架,提升LLM在形式化定理证明中的对称性与鲁棒性 large language model
38 LABO: LLM-Accelerated Bayesian Optimization through Broad Exploration and Selective Experimentation LABO:通过LLM加速贝叶斯优化,实现广泛探索和选择性实验 large language model
39 Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have) 通过 lm_head 权重矩阵的奇异值分解揭示LLM的隐藏语义和潜在问题 large language model

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
40 Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics 提出轨迹可达性度量(TRM)以修正潜在世界模型中的规划偏差 manipulation MPC world model
41 SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization 提出SCI-Defense框架,防御LLM排序系统中的生成引擎优化攻击 manipulation
42 MoSA: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap in Continuum Dynamics via Learning Residual Anisotropy MoSA:通过学习残余各向异性,缓解连续体动力学中实物到模拟的差距 manipulation sim-to-real
43 Do Not Trust The Auctioneer: Learning to Bid in Feedback-Manipulated Auctions 提出一种新算法以应对反馈操控拍卖中的竞标问题 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
44 Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation 改进均匀扩散模型:提出Leave-One-Out去噪器和吸收态重构 MDM
45 A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models 从微分方程出发提出扩散模型的系统性教程 classifier-free guidance
46 Generative Modeling by Value-Driven Transport 提出基于值驱动传输的生成模型,解决传统生成模型在路径模拟上的挑战。 classifier-free guidance

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
47 Physics-Informed Generative Solver: Bridging Data-Driven Priors and Conservation Laws for Stable Spatiotemporal Field Reconstruction 提出物理信息生成求解器,结合数据驱动先验与守恒定律,实现稳定时空场重建 sparse sensors spatiotemporal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
48 Departure from Regularity: Degree Heterogeneity and Eigengap as the Structural Drivers of ASE-LSE Latent Subspace Disagreement 揭示图结构异质性与特征值间隙对谱嵌入差异的影响 ASE

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
49 Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference 提出决策感知的二次ReLU替代方法,加速同态加密友好的神经网络推理。 OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页