cs.LG(2026-05-20)

📊 共 38 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (26 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (26 篇)

#题目一句话要点标签🔗
1 Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach CoMET:一种无需微调的模块化多模态分类方法,通过组合预训练模型实现 representation learning foundation model multimodal
2 CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation CAdam:上下文自适应矩估计,用于生成蒸馏中3D高斯快速优化 distillation 3D gaussian splatting 3DGS
3 PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment PREFINE:基于偏好的隐式奖励与代价微调,实现安全对齐 reinforcement learning offline RL imitation learning
4 Behavior-Consistent Deep Reinforcement Learning 提出QED算法,通过控制策略分布一致性提升强化学习的可靠性 reinforcement learning deep reinforcement learning
5 Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards 提出NFPO算法,通过多步似然比校正提升RLVR中语言模型的推理能力 reinforcement learning PPO large language model
6 Distributed Direct Preference Optimization 提出分布式DPO算法,解决异构用户偏好数据下的策略对齐问题 reinforcement learning offline RL DPO
7 DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards 提出DelTA以解决响应级奖励与token级概率变化不明的问题 reinforcement learning large language model
8 Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards 提出一种领域自适应强化学习框架,通过密集奖励提升代码生成质量,尤其在机器人领域。 reinforcement learning large language model
9 Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning FISolver:利用反向生成数据和引导强化学习发现动力系统首次积分 reinforcement learning large language model
10 Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression 提出Distribution-Aware Reward,用于提升LLM回归任务中预测分布的质量。 reinforcement learning large language model
11 TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health TimeSRL:通过语义强化学习微调LLM,实现可泛化的时间序列行为建模,应用于精神健康领域。 reinforcement learning MAE large language model
12 \textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent 提出随机均流策略(SMFP),通过单步生成控制解决强化学习中的多模态动作分布问题。 reinforcement learning SAC multimodal
13 PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG PACD-Net:基于伪增强对比蒸馏的血糖控制指标估计方法 contrastive learning distillation
14 AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals AVSD:通过平衡共识和教师特定特权信号实现自适应视角自蒸馏 distillation privileged information
15 You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories 提出RELEX以高效外推RLVR训练结果 reinforcement learning large language model
16 Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search FedKDNAS:结合分布式NAS与知识蒸馏的优化联邦学习框架 distillation
17 How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR G2D:通过适度在线RL预热提升离线偏好优化,降低计算成本 reinforcement learning DPO direct preference optimization
18 Efficient Learning of Deep State Space Models via Importance Smoothing 提出并行变分蒙特卡洛(PVMC)方法,高效训练深度状态空间模型(DSSM) state space model
19 A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation 统一因果与传统表示学习框架,实现优势互补与性能提升 representation learning
20 PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR PlexRL:面向RLVR的LLM服务集群级编排,提升资源利用率 reinforcement learning large language model
21 REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak REFLECTOR:通过内化逐步反思机制,防御针对大型语言模型的间接越狱攻击 reinforcement learning large language model
22 Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines 提出FPRO框架以解决航空发动机管道布置的可制造性问题 reinforcement learning
23 Time-Dependent PDE-Constrained Optimization via Weak-Form Latent Dynamics 提出基于弱形式潜在动力学的PDE约束优化方法,加速高维时变偏微分方程优化。 latent dynamics
24 ReversedQ: Opportunities for Faster Q-Learning in Episodic Online Reinforcement Learning ReversedQ:通过优化Q学习更新策略加速在线强化学习 reinforcement learning
25 Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting 提出PG-DPO,通过庞特里亚金最大值原理解决非指数贴现强化学习问题。 reinforcement learning DPO
26 AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback 提出AGPO以解决PPO/GRPO训练不稳定问题 reinforcement learning PPO

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
27 A Mechanistic Study of Tabular Foundation Models 研究表格数据预训练模型,揭示其内在机制与鲁棒性 foundation model
28 On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective 从学习理论角度分析思维链(CoT)的成本与收益 chain-of-thought
29 Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate 提出超参数转移量化框架以优化大规模语言模型训练 large language model
30 Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment 提出CLAIR框架以解决联邦学习中的LoRA微调问题 large language model
31 SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning 提出SMoA,通过谱调制适配器在参数高效微调中提升性能。 large language model
32 Conditioning Gaussian Processes on Almost Anything 提出一种通用高斯过程推断方案以解决复杂条件问题 large language model
33 The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure? 分析GLU结构优势:通过神经正切核视角揭示其更优的条件数和训练加速特性 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
34 DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning DeCoR:基于强化学习的城市街道设计与控制协同优化 shared control reinforcement learning
35 Compositional Transduction with Latent Analogies for Offline Goal-Conditioned Reinforcement Learning 提出基于潜在类比转换的组合泛化方法,解决离线目标条件强化学习问题 manipulation reinforcement learning
36 Point Cloud Sequence Encoding for Material-conditioned Graph Network Simulators 提出PEACH框架,通过点云序列编码实现材质条件下的图网络模拟器,提升真实场景适应性。 sim-to-real

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
37 Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment 提出REPA-P框架,通过表征对齐打破科学扩散模型中的捷径学习。 physics-informed diffusion

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
38 Classification of Single and Mixed Partial Discharges under Switching Voltage Using an AWA-CNN Framework 提出AWA-CNN框架,用于开关电压下单个和混合局部放电源的分类 PULSE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页