cs.LG(2026-05-18)

📊 共 46 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (31 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (11 🔗2) 支柱八:物理动画 (Physics-based Animation) (2) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (31 篇)

#题目一句话要点标签🔗
1 ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization ISEP:基于随机策略优化的离线强化学习隐式支持扩展 reinforcement learning offline reinforcement learning flow matching
2 AURORA: Contextual Orthogonalization for Geometric Representation Learning in Healthcare Foundation Models AURORA:面向医疗健康领域,通过上下文正交化实现几何表征学习 representation learning distillation foundation model
3 PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics 提出基于Port-Hamiltonian生成动态的物理驱动世界模型PH-Dreamer,提升控制任务性能。 world model world models dreamer
4 Foundation Models for Credit Risk Prediction: A Game Changer? 利用预训练tabular foundation模型提升信用风险预测,尤其适用于小样本场景。 predictive model large language model foundation model
5 Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees 提出一种知识蒸馏方法,将表格领域预训练模型压缩为CPU可用的梯度提升树,实现推理加速。 teacher-student distillation foundation model
6 UTOPYA: A Multimodal Deep Learning Framework for Physics-Informed Anomaly Detection and Time-Series Prediction UTOPYA:用于物理信息异常检测和时间序列预测的多模态深度学习框架 curriculum learning distillation multimodal
7 Distilling Tabular Foundation Models for Structured Health Data 提出面向结构化健康数据的表格基础模型蒸馏方法,实现轻量化部署。 distillation foundation model
8 KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture KairosHope:基于双记忆架构的下一代时间序列基础模型,用于专业分类 contrastive learning foundation model
9 TabH2O: A Unified Foundation Model for Tabular Prediction TabH2O:用于表格预测的统一基础模型,通过单次前向传播实现分类和回归。 curriculum learning foundation model
10 Heterogeneous Tasks Offloading in Vehicular Edge Computing: A Federated Meta Deep Reinforcement Learning Approach 提出FedMAGS框架,解决车载边缘计算中异构任务卸载的隐私保护与快速适应问题 reinforcement learning deep reinforcement learning
11 General Preference Reinforcement Learning 提出通用偏好强化学习(GPRL),解决LLM开放域任务中奖励函数设计难题。 reinforcement learning large language model
12 Post-Trained MoE Can Skip Half Experts via Self-Distillation ZEDA:通过自蒸馏使后训练MoE模型跳过半数专家,提升推理效率 distillation instruction following
13 Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework 提出ProRL:一种可解释的程序化强化学习框架,用于解决作业车间调度问题。 reinforcement learning deep reinforcement learning DRL
14 Beyond Inference-Time Search: Reinforcement Learning Synthesizes Reusable Solvers 利用强化学习合成可复用求解器,提升LLM在组合优化问题上的效率 reinforcement learning large language model
15 FedSDR: Federated Self-Distillation with Rectification FedSDR:联邦自蒸馏与校正,解决联邦微调大语言模型的异构性问题 distillation large language model
16 Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning 提出CodeThinker,通过一致性强化学习提升LLM的代码推理能力 reinforcement learning large language model
17 $\boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control 提出f-OPD框架,通过新鲜度感知控制稳定长程On-Policy蒸馏训练。 distillation large language model
18 Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights 提出基于强化学习的顾客轨迹建模方法,优化零售布局。 reinforcement learning PULSE
19 AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning AMARIS:一种基于记忆增强的评分规则改进系统,用于基于评分规则的强化学习 reinforcement learning reward shaping
20 HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents HINT-SD:面向长程Agent的靶向后见之明自蒸馏方法 reinforcement learning distillation
21 Federated Martingale Posterior Samping 提出联邦Martingale后验采样(FMP),解决联邦贝叶斯神经网络中先验难以确定问题。 predictive model large language model
22 Graph Hierarchical Recurrence for Long-Range Generalization 提出图分层递归(GHR)框架,解决图神经网络长程泛化问题。 representation learning foundation model
23 Alignment Dynamics in LLM Fine-Tuning 提出对齐动力学框架,解释并预测LLM微调中的对齐脆弱性与恢复现象 reinforcement learning large language model
24 Privacy Preserving Reinforcement Learning with One-Sided Feedback 提出POOL算法,解决单侧反馈多维连续状态动作空间下的隐私保护强化学习问题 reinforcement learning
25 Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning 提出交互破坏对抗学习框架,提升多智能体强化学习的鲁棒性 reinforcement learning
26 Multi-site PPG: An In-the-Wild Physiological Dataset from Emerging Multi-site Wearables 提出Multi-site PPG多位点生理数据集,用于评估新兴可穿戴设备在真实环境下的心率监测性能。 MAE TAMP
27 Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization 提出BiKD,通过双层优化平衡知识蒸馏中的样本级损失权重,解决不平衡学习问题。 distillation
28 Agentic Cost-Aware Query Planning with Knowledge Distillation for Big Data Analytics 提出基于知识蒸馏的Agentic Cost-Aware查询规划器,优化大数据分析中的资源受限查询。 distillation
29 Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models 提出自蒸馏方法以优化尖峰协方差模型中的谱收缩估计器 distillation
30 COOPO: Cyclic Offline-Online Policy Optimization Algorithm 提出COOPO算法,通过循环离线-在线策略优化,提升强化学习的样本效率和性能。 reinforcement learning offline reinforcement learning
31 DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization DiPRL:通过架构熵正则化学习离散程序化策略,提升强化学习任务性能 reinforcement learning deep reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
32 Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap 表格基础模型集成面临多样性上限和校准陷阱,推荐贪婪选择策略 foundation model
33 Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models 针对表格基础模型,提出重采样策略优化信贷风险预测中的上下文构建 foundation model
34 Prune, Update and Trim: Robust Structured Pruning for Large Language Models 提出Putri:一种鲁棒的大语言模型结构化剪枝方法,提升极端稀疏度下的性能。 large language model
35 The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought 研究低精度Softmax Transformer的表达能力,结合CoT实现图灵机模拟 chain-of-thought
36 S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs S2Aligner:面向稀疏文本属性图的高效可迁移预训练模型 large language model foundation model
37 MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization 提出模块自适应残差重构(MARR)方法,提升低比特后训练量化性能。 large language model
38 A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability 提出RRFP运行时,解决流水线并行训练中运行时变异导致的任务对齐问题。 multimodal
39 GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets GAMMA:面向任意预算的混合精度模型全局比特分配框架 large language model
40 Are Sparse Autoencoder Benchmarks Reliable? 审计稀疏自编码器基准以提升评估可靠性 large language model
41 A Unified Framework for Data-Free One-Step Sampling via Wasserstein Gradient Flows 提出统一框架实现无数据一阶采样以解决分布问题 multimodal
42 Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates 通过稳定SGD的大学习率,显著缩小LLM预训练中Adam与SGD的性能差距 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
43 SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models 提出敏感性意识的保真压缩方法以提升物理基础模型性能 spatiotemporal foundation model
44 Uncertainty Reliability Under Domain Shift: An Investigation for Data-Driven Blood Pressure Estimation in Photoplethysmography 针对PPG血压估计,研究领域偏移下的不确定性量化可靠性问题,提出DE+GNLL+CP/TS方案。 PULSE

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
45 FLAG: Foundation model representation with Latent diffusion Alignment via Graph for spatial gene expression prediction FLAG:利用图结构和潜在扩散对齐基因基础模型,预测空间基因表达 spatial relationship foundation model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
46 Learning to Look Benign: Targeted Evasion of Malware Detectors via API Import Injection 提出基于API导入注入的恶意软件定向逃逸方法,欺骗机器学习检测器。 manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页