cs.LG(2026-05-13)

📊 共 48 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (26 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (16 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱四:生成式动作 (Generative Motion) (2) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (26 篇)

#题目一句话要点标签🔗
1 JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning 提出JEDI:一种用于在线模型强化学习的联合嵌入扩散世界模型 reinforcement learning world model world models
2 Learning POMDP World Models from Observations with Language-Model Priors Pinductor:利用语言模型先验知识,高效学习部分可观测马尔可夫决策过程世界模型 world model world models generalist agent
3 Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning 提出TCE框架,通过目标对齐生成弥合离线强化学习跨域差距 reinforcement learning offline RL offline reinforcement learning
4 Trajectory-Level Data Augmentation for Offline Reinforcement Learning 提出轨迹级数据增强方法,提升离线强化学习在主动定位问题中的性能 reinforcement learning offline reinforcement learning
5 Dynamical Predictive Modelling of Cardiovascular Disease Progression Post-Myocardial Infarction via ECG-Trained Artificial Intelligence Model 提出基于心电图(ECG)训练的AI模型,用于心肌梗死后心血管疾病的动态预测。 predictive model contrastive learning foundation model
6 Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization 提出RDPO,通过解耦奖励优化多目标混合奖励强化学习,提升指令遵循和写作质量。 reinforcement learning instruction following
7 MARLIN: Multi-Agent Game-Theoretic Reinforcement Learning for Sustainable LLM Inference in Cloud Datacenters 提出MARLIN,利用多智能体博弈强化学习优化云数据中心LLM推理能耗与延迟。 reinforcement learning large language model
8 Teacher-Guided Policy Optimization for LLM Distillation 提出TGPO算法,通过教师引导策略优化解决LLM蒸馏中负反馈问题。 reinforcement learning imitation learning distillation
9 Coreset-Induced Conditional Velocity Flow Matching 提出Coreset诱导的条件速度流匹配(CCVFM),提升生成模型性能。 flow matching multimodal
10 Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation 提出对比近端策略优化(CPPO),实现免奖励函数的On-Policy自监督强化学习 reinforcement learning PPO
11 ERPPO: Entropy Regularization-based Proximal Policy Optimization 提出基于熵正则化的近端策略优化算法ERPPO,解决多维环境下MAPPO策略优化问题 reinforcement learning PPO spatiotemporal
12 CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem 提出CO-MAP以解决量子比特分配问题 reinforcement learning
13 HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning HLS-Seek:基于代理比较奖励强化学习的高层次综合QoR感知代码生成 reinforcement learning
14 Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation 提出奖励加权On-Policy蒸馏方法,提升NL到SVA生成的属性等价性 distillation
15 Path-independent Flow Matching for Multi-parameter Generative Dynamics 提出路径无关流匹配(PiFM),用于学习多参数生成动态中的路径无关变换。 flow matching
16 OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention OSDN:通过可证明的在线预处理改进线性注意力中的Delta规则 linear attention
17 Twincher: Bijective Representation Learning for Robust Inversion of Continuous Systems 提出Twincher以解决连续系统的鲁棒逆问题 representation learning
18 Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy 提出Q-Flow,利用Flow模型进行稳定且具有表达性的强化学习策略优化。 reinforcement learning
19 Support-Conditioned Flow Matching Is Kernel Smoothing 揭示条件化Flow Matching是核平滑,并用高斯核注意力实现高效条件生成 flow matching
20 Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning 提出基于切换后继测度的分层零样本强化学习方法,无需额外监督。 reinforcement learning
21 Stable Attention Response for Reliable Precipitation Nowcasting HARECast:通过稳定注意力响应提升可靠的降水临近预报 representation learning multimodal
22 On the Generalization of Knowledge Distillation: An Information-Theoretic View 从信息论视角分析知识蒸馏的泛化能力,并提出相应的泛化界限。 distillation
23 Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy 揭示多智能体系统谄媚现象并非仅由RLHF引起,提出激活空间干预缓解该问题 RLHF
24 Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective 提出ConSPO框架,通过对比学习优化LLM在RLVR中的推理能力,显著提升数学推理性能。 reinforcement learning
25 Achieving $ε^{-2}$ Sample Complexity for Single-Loop Actor-Critic under Minimal Assumptions 单循环Actor-Critic算法在最小假设下实现ε⁻²样本复杂度 reinforcement learning policy learning
26 SpikeProphecy: A Large-Scale Benchmark for Autoregressive Neural Population Forecasting SpikeProphecy:用于自回归神经群体预测的大规模基准测试 SSM distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)

#题目一句话要点标签🔗
27 MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling MILM:利用LLM和信息丰富的采样处理多模态非规则时间序列,提升EHR分类性能。 large language model multimodal
28 Multimodal Graph-based Classification of Esophageal Motility Disorders 提出基于多模态图神经网络的食管动力障碍分类方法,提升诊断准确性。 large language model multimodal
29 Decoupled and Divergence-Conditioned Prompt for Multi-domain Dynamic Graph Foundation Models 提出DyGFM,一种基于解耦和散度条件提示的多领域动态图基础模型 foundation model
30 Supervised Deep Multimodal Matrix Factorization for Interpretable Brain Network Analysis 提出SD3MF,用于可解释的脑网络分析,实现多模态图的监督预测。 multimodal
31 Machine Learning-Driven Multimodal Spectroscopic Liquid Biopsy for Early Multicancer Detection 提出基于机器学习的多模态光谱液体活检方法,用于早期多癌种检测 multimodal
32 Continual Fine-Tuning of Large Language Models via Program Memory 提出ProCL框架,通过程序记忆实现大语言模型在持续学习中的高效微调。 large language model
33 Large Language Models Lack Temporal Awareness of Medical Knowledge TempoMed-Bench揭示大语言模型缺乏医学知识的时间感知能力 large language model
34 The Expressivity Boundary of Probabilistic Circuits: A Comparison with Large Language Models 对比概率电路与大语言模型,揭示概率电路在语言建模中的表达能力瓶颈 large language model
35 GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction GHGbench:一个统一的多实体、多任务碳排放预测基准 foundation model multimodal
36 Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling 提出多层引导方法,提升生成式AI模型评估的可复现性 large language model
37 Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training 通过几何与谱分析,揭示低秩预训练语言模型与全秩模型的差异。 large language model
38 LIFT: Last-Mile Fine-Tuning for Table Explicitation 提出LIFT:一种针对表格补全的末端微调方法,提升小模型的性能。 large language model
39 Teaching and Learning under Deductive Errors 针对演绎错误的教学与学习框架,提升LLM等学习者在少样本学习中的性能 large language model
40 Learning Perturbations to Extrapolate Your LLM 提出基于可学习扰动的LLM外推框架,提升模型在域外泛化能力 large language model
41 Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2 提出代数本体投影(AOP)以控制LLM中的逻辑崩溃现象 large language model
42 Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning 研究数据难度与泛化-外推权衡,指导LLM微调数据选择 large language model

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
43 DisAgg: Distributed Aggregators for Efficient Secure Aggregation in Federated Learning DisAgg:一种用于联邦学习中高效安全聚合的分布式聚合器 OMOMO
44 The WidthWall: A Strict Expressivity Hierarchy for Hypergraph Neural Networks 提出超图神经网络的宽度壁垒理论,揭示模型表达能力极限。 OMOMO

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
45 McCast: Memory-Guided Latent Drift Correction for Long-Horizon Precipitation Nowcasting McCast:利用记忆引导的潜在漂移校正实现长时程降水临近预报 physically plausible
46 Understanding and Accelerating the Training of Masked Diffusion Language Models 提出钟形时间采样策略,加速Masked Diffusion语言模型的训练。 MDM

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
47 Spatiotemporal downscaling and nowcasting of urban land surface temperatures with deep neural networks 提出基于深度神经网络的时空降尺度和城市地表温度临近预报方法 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
48 Strategic PAC Learnability via Geometric Definability 通过几何可定义性提出战略PAC可学习性解决方案 manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页