cs.LG(2026-05-06)

📊 共 46 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (22 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (16 🔗1) 支柱一:机器人控制 (Robot Control) (6) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (22 篇)

#题目一句话要点标签🔗
1 Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models 提出UE-DPO,通过不确定性引导探索,提升多模态大语言模型视觉对齐能力 DPO direct preference optimization large language model
2 Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis 提出门控多模态学习模型,用于可解释的建筑能效预测和改造方案分析。 MAE multimodal
3 To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition 提出双路径冲突解决框架DCR,用于多模态情感识别中的模态冲突问题。 distillation multimodal
4 Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation 提出Power自蒸馏方法,桥接采样、自奖励强化学习和自蒸馏,提升LLM推理能力。 reinforcement learning distillation large language model
5 Data-dependent Exploration for Online Reinforcement Learning from Human Feedback 提出数据依赖探索方法以优化人类反馈的在线强化学习 reinforcement learning RLHF large language model
6 Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization 提出基于偏好的自蒸馏PBSD,提升数学推理和工具使用中的训练稳定性和性能。 reinforcement learning preference learning distillation
7 CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies CRAFT:用于自动驾驶策略的反事实到交互式强化微调 imitation learning distillation vision-language-action
8 Towards General Preference Alignment: Diffusion Models at Nash Equilibrium 提出Diffusion-NPO,通过博弈论视角提升扩散模型与人类偏好对齐 reinforcement learning RLHF DPO
9 Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning 提出自适应策略选择与微调方法,解决离线到在线强化学习中的交互预算限制问题。 reinforcement learning offline RL
10 Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations 提出基于模仿学习的Vlasov-Poisson方程控制方法,解决核聚变等离子体不稳定性问题 imitation learning behavior cloning
11 Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning 提出Graph-SND以解决多智能体强化学习中的行为多样性问题 reinforcement learning PPO
12 Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning 提出统一框架以研究多臂老虎机与强化学习中的分布性遗憾 reinforcement learning
13 The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence 揭示预测学习中的预测-因果差距:理论证明与大规模神经证据 world model world models representation learning
14 Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization 提出基于结果级优化的强化学习方法,提升组合泛化能力 reinforcement learning
15 A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs 提出基于调和平均的平均奖励强化学习算法,解决SMDPs中非稳态问题 reinforcement learning
16 Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs 提出极性感知的子句-文字超图表示学习框架,用于提升不可满足核心预测。 representation learning
17 Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models 提出Counter-Dyna以解决HVAC控制中的数据效率问题 reinforcement learning PPO predictive model
18 Beyond Rigid Geometries: The Spline-Pullback Metric for Universal Diffeomorphic SPD Representation Learning 提出Spline-Pullback Metric (SPM)用于通用微分同胚SPD表示学习,突破刚性几何限制。 representation learning
19 Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior 提出流形引导以揭示神经网络表示与行为的共享几何结构 world model world models
20 A geometric relation of the error introduced by sampling a language model's output distribution to its internal state 提出几何关系以解决语言模型输出分布采样误差问题 world model world models
21 Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models 提出非单调三角结构因果模型,实现具身交互中反事实推断的精确性和稳定性。 world model world models
22 Extending Differential Temporal Difference Methods for Episodic Problems 扩展差分时序差分方法至 episodic 问题,提升样本效率 reinforcement learning deep reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)

#题目一句话要点标签🔗
23 Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals 提出基于SHAP特征选择和混合梯度提升的驾驶行为分类框架,利用多模态生理信号实现高性能和可解释性。 multimodal
24 Bridging Input Feature Spaces Towards Graph Foundation Models 提出ALL-IN方法以解决图学习输入特征空间不一致问题 foundation model
25 Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB) 评估LLM在海量声音嵌入基准(MSEB)上的表现,探索音频理解的建模范式。 large language model multimodal
26 Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs 提出基于Delta-Code生成的神经架构搜索方法,通过代码差异微调LLM,提升效率并简化代码。 large language model
27 Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction 提出基于动态系统预测的低成本黑盒大语言模型幻觉检测方法 large language model
28 Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation 提出CausalFlow-T和LLM驱动的演化补全器,解决不完整医疗数据中的联合因果效应估计问题 large language model
29 On the Hardness of Junking LLMs 研究LLM中触发有害输出的“垃圾序列”的难易程度,发现其难度高于标准越狱攻击。 large language model
30 CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels CuBridge:基于LLM的高性能Attention Kernel理解与重构框架 large language model
31 Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics 提出基于注意力的传输能力和方向性诊断方法,用于评估大语言模型的幻觉问题。 large language model
32 Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop 提出个性化思维模型PTM,结合人机协作提升AI教育支持效果 large language model
33 OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization 提出OSAQ,通过权重自吸收抑制异常值,提升低比特LLM量化精度。 large language model
34 A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints 提出基于排队论的LLM推理稳定性分析框架,解决KV缓存内存约束下的GPU资源分配问题 large language model
35 Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control 提出Anchored Learning,通过显式分布控制稳定LLM监督微调 large language model
36 Demystifying Manifold Constraints in LLM Pre-training 提出MACRO优化器,揭示流形约束在LLM预训练中的作用,提升稳定性和性能。 large language model
37 Contextual Memory-Enhanced Source Coding for Low-SNR Communications 提出记忆增强的信源编码MASC,提升低信噪比通信下文本传输的鲁棒性。 large language model
38 Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment 提出DistPFN,通过测试时后验调整缓解表格数据上下文学习中的标签偏移问题 foundation model

🔬 支柱一:机器人控制 (Robot Control) (6 篇)

#题目一句话要点标签🔗
39 Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination Dream-MPC:基于潜在空间想象的梯度模型预测控制,提升连续控制任务性能。 MPC model predictive control reinforcement learning
40 ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC ELVIS:用于长时程视觉MPC的集成校准潜在想象 MPC model predictive control reinforcement learning
41 SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning 提出SPHERE,缓解MoE在深度强化学习中因持续学习导致的光谱可塑性损失 humanoid reinforcement learning deep reinforcement learning
42 Bilinear Mamba-Koopman Neural MPC for Varying Dynamics 提出Bilinear Mamba-Koopman Neural MPC,通过控制依赖的潜在动力学提升时变环境下的MPC性能。 MPC latent dynamics Mamba
43 One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving 提出HELM以解决生成推荐系统中的内存分配问题 recovery control PPO
44 Gray-Box Poisoning of Continuous Malware Ingestion Pipelines 针对持续恶意软件检测管道的灰盒投毒攻击与防御研究 manipulation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
45 Geometry-Aware Neural Optimizer for Shape Optimization and Inversion 提出几何感知神经优化器GANO,用于形状优化与反演,实现可控的几何更新。 latent optimization

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
46 Scalable inference of spatial regions and temporal signatures from time series 提出基于最小描述长度原则的空间时间序列区域化方法,实现可扩展的区域划分和时间特征提取。 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页