cs.LG（2026-05-06）

📊 共 46 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (22 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (16 🔗1) 支柱一：机器人控制 (Robot Control) (6) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (22 篇)

#	题目	一句话要点	标签	🔗
1	Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models	提出UE-DPO，通过不确定性引导探索，提升多模态大语言模型视觉对齐能力	DPO direct preference optimization large language model
2	Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis	提出门控多模态学习模型，用于可解释的建筑能效预测和改造方案分析。	MAE multimodal
3	To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition	提出双路径冲突解决框架DCR，用于多模态情感识别中的模态冲突问题。	distillation multimodal
4	Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation	提出Power自蒸馏方法，桥接采样、自奖励强化学习和自蒸馏，提升LLM推理能力。	reinforcement learning distillation large language model
5	Data-dependent Exploration for Online Reinforcement Learning from Human Feedback	提出数据依赖探索方法以优化人类反馈的在线强化学习	reinforcement learning RLHF large language model
6	Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization	提出基于偏好的自蒸馏PBSD，提升数学推理和工具使用中的训练稳定性和性能。	reinforcement learning preference learning distillation
7	CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies	CRAFT：用于自动驾驶策略的反事实到交互式强化微调	imitation learning distillation vision-language-action	✅
8	Towards General Preference Alignment: Diffusion Models at Nash Equilibrium	提出Diffusion-NPO，通过博弈论视角提升扩散模型与人类偏好对齐	reinforcement learning RLHF DPO
9	Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning	提出自适应策略选择与微调方法，解决离线到在线强化学习中的交互预算限制问题。	reinforcement learning offline RL
10	Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations	提出基于模仿学习的Vlasov-Poisson方程控制方法，解决核聚变等离子体不稳定性问题	imitation learning behavior cloning
11	Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning	提出Graph-SND以解决多智能体强化学习中的行为多样性问题	reinforcement learning PPO
12	Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning	提出统一框架以研究多臂老虎机与强化学习中的分布性遗憾	reinforcement learning
13	The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence	揭示预测学习中的预测-因果差距：理论证明与大规模神经证据	world model world models representation learning
14	Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization	提出基于结果级优化的强化学习方法，提升组合泛化能力	reinforcement learning
15	A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs	提出基于调和平均的平均奖励强化学习算法，解决SMDPs中非稳态问题	reinforcement learning
16	Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs	提出极性感知的子句-文字超图表示学习框架，用于提升不可满足核心预测。	representation learning
17	Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models	提出Counter-Dyna以解决HVAC控制中的数据效率问题	reinforcement learning PPO predictive model
18	Beyond Rigid Geometries: The Spline-Pullback Metric for Universal Diffeomorphic SPD Representation Learning	提出Spline-Pullback Metric (SPM)用于通用微分同胚SPD表示学习，突破刚性几何限制。	representation learning
19	Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior	提出流形引导以揭示神经网络表示与行为的共享几何结构	world model world models
20	A geometric relation of the error introduced by sampling a language model's output distribution to its internal state	提出几何关系以解决语言模型输出分布采样误差问题	world model world models
21	Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models	提出非单调三角结构因果模型，实现具身交互中反事实推断的精确性和稳定性。	world model world models
22	Extending Differential Temporal Difference Methods for Episodic Problems	扩展差分时序差分方法至 episodic 问题，提升样本效率	reinforcement learning deep reinforcement learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

#	题目	一句话要点	标签	🔗
23	Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals	提出基于SHAP特征选择和混合梯度提升的驾驶行为分类框架，利用多模态生理信号实现高性能和可解释性。	multimodal
24	Bridging Input Feature Spaces Towards Graph Foundation Models	提出ALL-IN方法以解决图学习输入特征空间不一致问题	foundation model
25	Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB)	评估LLM在海量声音嵌入基准（MSEB）上的表现，探索音频理解的建模范式。	large language model multimodal
26	Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs	提出基于Delta-Code生成的神经架构搜索方法，通过代码差异微调LLM，提升效率并简化代码。	large language model
27	Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction	提出基于动态系统预测的低成本黑盒大语言模型幻觉检测方法	large language model
28	Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation	提出CausalFlow-T和LLM驱动的演化补全器，解决不完整医疗数据中的联合因果效应估计问题	large language model
29	On the Hardness of Junking LLMs	研究LLM中触发有害输出的“垃圾序列”的难易程度，发现其难度高于标准越狱攻击。	large language model
30	CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels	CuBridge：基于LLM的高性能Attention Kernel理解与重构框架	large language model
31	Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics	提出基于注意力的传输能力和方向性诊断方法，用于评估大语言模型的幻觉问题。	large language model
32	Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop	提出个性化思维模型PTM，结合人机协作提升AI教育支持效果	large language model
33	OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization	提出OSAQ，通过权重自吸收抑制异常值，提升低比特LLM量化精度。	large language model
34	A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints	提出基于排队论的LLM推理稳定性分析框架，解决KV缓存内存约束下的GPU资源分配问题	large language model
35	Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control	提出Anchored Learning，通过显式分布控制稳定LLM监督微调	large language model
36	Demystifying Manifold Constraints in LLM Pre-training	提出MACRO优化器，揭示流形约束在LLM预训练中的作用，提升稳定性和性能。	large language model
37	Contextual Memory-Enhanced Source Coding for Low-SNR Communications	提出记忆增强的信源编码MASC，提升低信噪比通信下文本传输的鲁棒性。	large language model
38	Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment	提出DistPFN，通过测试时后验调整缓解表格数据上下文学习中的标签偏移问题	foundation model	✅

🔬 支柱一：机器人控制 (Robot Control) (6 篇)

#	题目	一句话要点	标签
39	Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination	Dream-MPC：基于潜在空间想象的梯度模型预测控制，提升连续控制任务性能。	MPC model predictive control reinforcement learning
40	ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC	ELVIS：用于长时程视觉MPC的集成校准潜在想象	MPC model predictive control reinforcement learning
41	SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning	提出SPHERE，缓解MoE在深度强化学习中因持续学习导致的光谱可塑性损失	humanoid reinforcement learning deep reinforcement learning
42	Bilinear Mamba-Koopman Neural MPC for Varying Dynamics	提出Bilinear Mamba-Koopman Neural MPC，通过控制依赖的潜在动力学提升时变环境下的MPC性能。	MPC latent dynamics Mamba
43	One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving	提出HELM以解决生成推荐系统中的内存分配问题	recovery control PPO
44	Gray-Box Poisoning of Continuous Malware Ingestion Pipelines	针对持续恶意软件检测管道的灰盒投毒攻击与防御研究	manipulation

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
45	Geometry-Aware Neural Optimizer for Shape Optimization and Inversion	提出几何感知神经优化器GANO，用于形状优化与反演，实现可控的几何更新。	latent optimization

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
46	Scalable inference of spatial regions and temporal signatures from time series	提出基于最小描述长度原则的空间时间序列区域化方法，实现可扩展的区域划分和时间特征提取。	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-06）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (22 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

🔬 支柱一：机器人控制 (Robot Control) (6 篇)

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理