cs.LG（2026-05-15）

📊 共 33 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (17 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (12 🔗2) 支柱一：机器人控制 (Robot Control) (2) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (17 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking	提出概率块掩码（PCM）加速VLA强化学习，提升梯度计算效率。	reinforcement learning world model world models
2	DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation	提出DeltaPrompts，通过主动挖掘高差异性提示，提升多模态蒸馏效果。	teacher-student distillation multimodal
3	Mind Dreamer: Untethering Imagination via Active Latent Intervention on Latent Manifolds	提出Mind Dreamer以解决模型基强化学习中的历史束缚问题	reinforcement learning world model world models
4	Offline Reinforcement Learning with Universal Horizon Models	提出通用视野模型以解决离线强化学习中的长期预测问题	reinforcement learning offline RL offline reinforcement learning	✅
5	Constrained latent state modeling: A unifying perspective on representation learning under competing constraints	提出约束隐状态建模，统一视角审视竞争约束下的表征学习	representation learning multimodal
6	AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs	AstraFlow：面向Agentic LLM的数据流强化学习系统	reinforcement learning large language model
7	Multi-Fidelity Flow Matching: Cascaded Refinement of PDE Solutions	提出多重保真度流匹配(MFFM)，用于参数化偏微分方程解的级联优化。	flow matching spatiotemporal
8	parallelcbf: A composable safety-filter and auditability framework for tensor-parallel reinforcement learning	ParallelCBF：用于张量并行强化学习的可组合安全过滤器与可审计性框架	reinforcement learning behavior cloning	✅
9	BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control	提出BAPR，结合贝叶斯在线变化检测与鲁棒集成强化学习，解决非平稳连续控制问题。	reinforcement learning SAC
10	Looped SSMs: Depth-Recurrence and Input Reshaping for Time Series Classification	提出循环状态空间模型以提升时间序列分类性能	SSM state space model
11	MIND: Decoupling Model-Induced Label Noise via Latent Manifold Disentanglement	MIND：通过解耦潜在流形来消除模型引入的标签噪声	distillation foundation model
12	Dynamics-Level Watermarking of Flow Matching Models with Random Codes	提出一种基于随机码的流匹配模型动态层水印方法，用于保护生成模型版权。	flow matching
13	A Multi-Layer Cloud-IDS Pipeline with LLM and Adaptive Q-Learning Calibration	提出基于LLM和自适应Q学习校准的多层云IDS流水线，提升云环境安全性。	reinforcement learning large language model
14	Tighter Regret Bounds for Contextual Action-Set Reinforcement Learning	针对上下文动作集强化学习，提出更紧的遗憾界限，提升算法性能。	reinforcement learning
15	Pessimistic Risk-Aware Policy Learning in Contextual Bandits	提出统一分布框架以优化风险感知的离线策略学习	policy learning
16	Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making	Ada-Diffuser：用于决策的潜在感知自适应扩散模型，显式建模潜在动态。	policy learning latent dynamics
17	VSPO: Vector-Steered Policy Optimization for Behavioral Control	提出VSPO：通过向量引导策略优化实现语言模型的行为控制	distillation reward shaping

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
18	GOMA: Toward Structure-Driven Multimodal Alignment from a Graph Signal Smoothing Perspective	GOMA：提出一种图优化的多模态对齐框架，利用图结构提升冻结多模态嵌入的检索性能。	multimodal
19	Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models	提出可微混合Agent（DMoA），激励大语言模型涌现群体智能	large language model
20	ITGPT: Generative Pretraining on Irregular Timeseries	ITGPT：用于不规则时间序列的生成式预训练模型，无需重采样或插补。	large language model multimodal
21	LoCO: Low-rank Compositional Rotation Fine-tuning	提出LoCO，通过低秩组合正交微调提升参数高效微调的几何结构保持能力	foundation model
22	Entropic Auto-Encoding via Implicit Free-Energy Minimization	提出Entropic Autoencoders，通过隐式自由能最小化缓解VAE中的后验崩溃问题。	multimodal
23	CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts	提出CHoE，通过结构条件专家网络解决跨域异构图Prompt学习问题。	foundation model
24	AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training	提出AOT-POT，通过自适应算子变换实现大规模PDE预训练，显著提升模型泛化能力。	foundation model
25	SEED: Targeted Data Selection by Weighted Independent Set	SEED：通过加权独立集实现有针对性的数据选择，提升模型训练效率与性能。	multimodal
26	IO-SVD: Input-Output Whitened SVD for Adaptive-Rank LLM Compression	提出IO-SVD，通过KL感知双边白化SVD实现自适应秩LLM压缩	large language model	✅
27	STS: Efficient Sparse Attention with Speculative Token Sparsity	提出STS：一种基于推测Token稀疏性的高效稀疏注意力机制，加速LLM推理。	large language model
28	Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs	提出Ghosted Layers以解决层修剪后激活对齐问题	large language model
29	SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference	SurvivalPFN：通过上下文贝叶斯推断实现生存分析的泛化预测	foundation model	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
30	A Unified Perturbation Framework for Analyzing Leaderboard Stability and Manipulation	提出统一扰动框架，分析并操控大语言模型排行榜的稳定性和鲁棒性	manipulation large language model
31	ADAPT: A Self-Calibrating Proactive Autoscaler for Container Orchestration	ADAPT：一种自校准的主动容器编排自动伸缩器	MPC model predictive control

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
32	On the Fragility of Data Attribution When Learning Is Distributed	提出数据归因鲁棒性方法以应对分布式学习中的攻击	latent optimization

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
33	Perforated Neural Networks for Keyword Spotting	提出基于穿孔神经网络的关键词识别方法，在边缘设备上实现高精度和小模型。	PULSE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页