cs.LG（2026-05-13）

📊 共 48 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (26 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (16 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (2) 支柱四：生成式动作 (Generative Motion) (2) 支柱八：物理动画 (Physics-based Animation) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (26 篇)

#	题目	一句话要点	标签	🔗
1	JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning	提出JEDI：一种用于在线模型强化学习的联合嵌入扩散世界模型	reinforcement learning world model world models
2	Learning POMDP World Models from Observations with Language-Model Priors	Pinductor：利用语言模型先验知识，高效学习部分可观测马尔可夫决策过程世界模型	world model world models generalist agent	✅
3	Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning	提出TCE框架，通过目标对齐生成弥合离线强化学习跨域差距	reinforcement learning offline RL offline reinforcement learning
4	Trajectory-Level Data Augmentation for Offline Reinforcement Learning	提出轨迹级数据增强方法，提升离线强化学习在主动定位问题中的性能	reinforcement learning offline reinforcement learning
5	Dynamical Predictive Modelling of Cardiovascular Disease Progression Post-Myocardial Infarction via ECG-Trained Artificial Intelligence Model	提出基于心电图（ECG）训练的AI模型，用于心肌梗死后心血管疾病的动态预测。	predictive model contrastive learning foundation model
6	Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization	提出RDPO，通过解耦奖励优化多目标混合奖励强化学习，提升指令遵循和写作质量。	reinforcement learning instruction following
7	MARLIN: Multi-Agent Game-Theoretic Reinforcement Learning for Sustainable LLM Inference in Cloud Datacenters	提出MARLIN，利用多智能体博弈强化学习优化云数据中心LLM推理能耗与延迟。	reinforcement learning large language model
8	Teacher-Guided Policy Optimization for LLM Distillation	提出TGPO算法，通过教师引导策略优化解决LLM蒸馏中负反馈问题。	reinforcement learning imitation learning distillation
9	Coreset-Induced Conditional Velocity Flow Matching	提出Coreset诱导的条件速度流匹配(CCVFM)，提升生成模型性能。	flow matching multimodal
10	Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation	提出对比近端策略优化(CPPO)，实现免奖励函数的On-Policy自监督强化学习	reinforcement learning PPO
11	ERPPO: Entropy Regularization-based Proximal Policy Optimization	提出基于熵正则化的近端策略优化算法ERPPO，解决多维环境下MAPPO策略优化问题	reinforcement learning PPO spatiotemporal
12	CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem	提出CO-MAP以解决量子比特分配问题	reinforcement learning
13	HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning	HLS-Seek：基于代理比较奖励强化学习的高层次综合QoR感知代码生成	reinforcement learning
14	Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation	提出奖励加权On-Policy蒸馏方法，提升NL到SVA生成的属性等价性	distillation
15	Path-independent Flow Matching for Multi-parameter Generative Dynamics	提出路径无关流匹配(PiFM)，用于学习多参数生成动态中的路径无关变换。	flow matching
16	OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention	OSDN：通过可证明的在线预处理改进线性注意力中的Delta规则	linear attention
17	Twincher: Bijective Representation Learning for Robust Inversion of Continuous Systems	提出Twincher以解决连续系统的鲁棒逆问题	representation learning
18	Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy	提出Q-Flow，利用Flow模型进行稳定且具有表达性的强化学习策略优化。	reinforcement learning
19	Support-Conditioned Flow Matching Is Kernel Smoothing	揭示条件化Flow Matching是核平滑，并用高斯核注意力实现高效条件生成	flow matching
20	Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning	提出基于切换后继测度的分层零样本强化学习方法，无需额外监督。	reinforcement learning	✅
21	Stable Attention Response for Reliable Precipitation Nowcasting	HARECast：通过稳定注意力响应提升可靠的降水临近预报	representation learning multimodal
22	On the Generalization of Knowledge Distillation: An Information-Theoretic View	从信息论视角分析知识蒸馏的泛化能力，并提出相应的泛化界限。	distillation
23	Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy	揭示多智能体系统谄媚现象并非仅由RLHF引起，提出激活空间干预缓解该问题	RLHF
24	Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective	提出ConSPO框架，通过对比学习优化LLM在RLVR中的推理能力，显著提升数学推理性能。	reinforcement learning
25	Achieving $ε^{-2}$ Sample Complexity for Single-Loop Actor-Critic under Minimal Assumptions	单循环Actor-Critic算法在最小假设下实现ε⁻²样本复杂度	reinforcement learning policy learning
26	SpikeProphecy: A Large-Scale Benchmark for Autoregressive Neural Population Forecasting	SpikeProphecy：用于自回归神经群体预测的大规模基准测试	SSM distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

#	题目	一句话要点	标签	🔗
27	MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling	MILM：利用LLM和信息丰富的采样处理多模态非规则时间序列，提升EHR分类性能。	large language model multimodal
28	Multimodal Graph-based Classification of Esophageal Motility Disorders	提出基于多模态图神经网络的食管动力障碍分类方法，提升诊断准确性。	large language model multimodal
29	Decoupled and Divergence-Conditioned Prompt for Multi-domain Dynamic Graph Foundation Models	提出DyGFM，一种基于解耦和散度条件提示的多领域动态图基础模型	foundation model
30	Supervised Deep Multimodal Matrix Factorization for Interpretable Brain Network Analysis	提出SD3MF，用于可解释的脑网络分析，实现多模态图的监督预测。	multimodal	✅
31	Machine Learning-Driven Multimodal Spectroscopic Liquid Biopsy for Early Multicancer Detection	提出基于机器学习的多模态光谱液体活检方法，用于早期多癌种检测	multimodal
32	Continual Fine-Tuning of Large Language Models via Program Memory	提出ProCL框架，通过程序记忆实现大语言模型在持续学习中的高效微调。	large language model
33	Large Language Models Lack Temporal Awareness of Medical Knowledge	TempoMed-Bench揭示大语言模型缺乏医学知识的时间感知能力	large language model
34	The Expressivity Boundary of Probabilistic Circuits: A Comparison with Large Language Models	对比概率电路与大语言模型，揭示概率电路在语言建模中的表达能力瓶颈	large language model
35	GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction	GHGbench：一个统一的多实体、多任务碳排放预测基准	foundation model multimodal
36	Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling	提出多层引导方法，提升生成式AI模型评估的可复现性	large language model
37	Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training	通过几何与谱分析，揭示低秩预训练语言模型与全秩模型的差异。	large language model
38	LIFT: Last-Mile Fine-Tuning for Table Explicitation	提出LIFT：一种针对表格补全的末端微调方法，提升小模型的性能。	large language model
39	Teaching and Learning under Deductive Errors	针对演绎错误的教学与学习框架，提升LLM等学习者在少样本学习中的性能	large language model
40	Learning Perturbations to Extrapolate Your LLM	提出基于可学习扰动的LLM外推框架，提升模型在域外泛化能力	large language model
41	Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2	提出代数本体投影（AOP）以控制LLM中的逻辑崩溃现象	large language model
42	Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning	研究数据难度与泛化-外推权衡，指导LLM微调数据选择	large language model

🔬 支柱五：交互与反应 (Interaction & Reaction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
43	DisAgg: Distributed Aggregators for Efficient Secure Aggregation in Federated Learning	DisAgg：一种用于联邦学习中高效安全聚合的分布式聚合器	OMOMO
44	The WidthWall: A Strict Expressivity Hierarchy for Hypergraph Neural Networks	提出超图神经网络的宽度壁垒理论，揭示模型表达能力极限。	OMOMO

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
45	McCast: Memory-Guided Latent Drift Correction for Long-Horizon Precipitation Nowcasting	McCast：利用记忆引导的潜在漂移校正实现长时程降水临近预报	physically plausible
46	Understanding and Accelerating the Training of Masked Diffusion Language Models	提出钟形时间采样策略，加速Masked Diffusion语言模型的训练。	MDM

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
47	Spatiotemporal downscaling and nowcasting of urban land surface temperatures with deep neural networks	提出基于深度神经网络的时空降尺度和城市地表温度临近预报方法	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
48	Strategic PAC Learnability via Geometric Definability	通过几何可定义性提出战略PAC可学习性解决方案	manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-13）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (26 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

🔬 支柱五：交互与反应 (Interaction & Reaction) (2 篇)

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理