cs.LG(2026-05-26)

📊 共 45 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (23) 支柱九:具身大模型 (Embodied Foundation Models) (16) 支柱一:机器人控制 (Robot Control) (3) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (23 篇)

#题目一句话要点标签🔗
1 Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice 提出双阶段适配器,保证表格基础模型在离散选择预测中的经济有效性 distillation foundation model
2 When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control RLScale-Bench基准测试揭示:校准后的规则控制器在自适应资源控制中优于主流深度强化学习算法。 reinforcement learning deep reinforcement learning DRL
3 Adaptive Reinforcement Learning for Robust Open Quantum System Control: A Multi-Task Framework with Temporal Optimization 提出多任务SAC强化学习框架,用于鲁棒开放量子系统控制,实现时序优化。 reinforcement learning SAC PULSE
4 Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning GraphGPO:基于图的信用分配方法,提升Agentic强化学习效率 reinforcement learning large language model
5 Recursive Flow Matching 提出递归流匹配(RecFM),加速高精度时空动力学系统建模与预测。 flow matching spatiotemporal
6 BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning BASIS:利用单次Rollout信息共享进行批量优势估计,提升LLM推理能力 reinforcement learning policy learning large language model
7 Causal Representation Learning for Generalisable Recommendation 提出基于因果表征学习的推荐方法,提升推荐系统在分布偏移下的泛化能力。 predictive model representation learning
8 Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation 提出Teachability-Aware OPD,通过选择可学习的token信号提升On-policy蒸馏效果。 teacher-student distillation
9 WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization WINDQuant:基于权重信息的神经决策,用于全局混合精度LLM量化 reinforcement learning PPO large language model
10 Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders SAERL:利用稀疏自编码器模型内部信息指导LLM后训练数据工程 reinforcement learning large language model
11 Learning Dynamic Graph Representations through Timespan View Contrasts 提出CLDG和CLDG++框架,通过时序跨度对比学习动态图表示,用于节点分类和异常检测。 contrastive learning TAMP
12 Less is More: Early Stopping Rollout for On-Policy Distillation 提出早期停止Rollout蒸馏方法,解决On-Policy蒸馏中的教师模型退化问题。 distillation
13 SQARL: A Size-Agnostic Reinforcement Learning approach for Circuit Allocation in Distributed Quantum Architectures 提出SQARL:一种规模无关的强化学习方法,用于分布式量子架构中的电路分配 reinforcement learning
14 SPHERE-JEPA: Spherical Prediction with Homogeneous Embeddings SPHERE-JEPA:通过均匀嵌入的球面预测,提升自监督学习表征质量 JEPA
15 Generalist Graph Anomaly Detection via Prototype-Based Distillation 提出ProMoS,一种基于原型蒸馏的通用图异常检测无监督框架 distillation
16 Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts 提出R2E-IG模型,通过混合专家网络提升车辆路径问题在分布偏移下的泛化能力 reinforcement learning deep reinforcement learning DRL
17 Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training 提出Pilot-Commit框架,通过预算感知的rollout分配,加速基于群组的RL后训练。 reinforcement learning large language model
18 Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition 提出DyCo-CL框架,解决少样本自动调制识别中SSL方法的不足。 contrastive learning
19 Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards 提出Focal Reward,解决LLM中基于规则奖励的强化学习训练不平衡问题。 reinforcement learning
20 PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design PRISM:用于多层薄膜设计的位移编码回归逆谱模型 MAE spatial relationship
21 Trust Region Q Adjoint Matching 提出Trust Region Q-Adjoint Matching,稳定优化预训练流策略的离线强化学习。 reinforcement learning offline RL
22 Ratio-Variance Regularized Policy Optimization 提出R²VPO,通过策略比率方差正则化实现稳定高效的策略优化 reinforcement learning PPO
23 Adversarial Training for Robust Coverage Network under Worst-case Facility Losses 提出双代理深度强化学习框架以解决最大覆盖位置干扰问题 reinforcement learning deep reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)

#题目一句话要点标签🔗
24 Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling Falcon-X:用于异构多元建模的时间序列基础模型 foundation model
25 LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models 提出LUCoS,利用无监督潜在空间上下文选择提升表格数据小样本学习性能 foundation model
26 EEG-FM-Audit: A Systematic Evaluation and Analysis Pipeline for EEG Foundation Models 提出EEG-FM-Audit以解决EEG基础模型评估透明性问题 foundation model
27 Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models 针对大语言模型,研究Scale Vector的作用并提出优化策略,显著提升模型性能。 large language model
28 Particle-Lund Multimodality in Jet Taggers 提出PLuM以提升粒子喷流标记的性能 multimodal
29 Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift 提出基于TabPFN的模型以解决儿童贫血预测中的数据稀缺问题 foundation model
30 Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models 揭示基于重构的脑电基础模型对非周期性和低频成分的偏好 foundation model
31 The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery Kalman Evolve:利用可解释算法发现弥合卡尔曼滤波的差距 large language model
32 Convergence of Spectral Descent for Non-smooth Optimization 针对非光滑优化,提出谱下降算法的收敛性分析框架 large language model
33 MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training 提出MONA:一种结合Nesterov加速的Muon优化器,用于可扩展的语言模型训练。 large language model
34 Innovation: An Almost Characterization of Hallucination 通过“创新性”刻画LLM幻觉现象,揭示校准模型固有缺陷 large language model
35 More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations 提出Token自适应激活混合(MoA)方法,提升Transformer FFN层表达能力。 large language model
36 SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks? SEC-bench Pro:评估语言模型在长程软件安全任务中的能力 large language model
37 Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks 针对开源LLM防御机制,提出基于消融和预填充的简单攻击方法 large language model
38 The Stability of Singular Distribution: A Spectral Perspective on the Two-Phase Dynamics of Language Model Pre-training 揭示大语言模型预训练两阶段动态的谱视角:奇异分布稳定性(SoSD) large language model
39 Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training 提出Extra-Merge以优化语言模型合并过程 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
40 Adversarial Dual On-Policy Distillation from Expressive Flow-based Teacher 提出FA-OPD对抗双重在线策略蒸馏方法,提升模仿学习在具身控制中的鲁棒性。 locomotion manipulation flow matching
41 Probabilistic Recurrent Intention Switching Model 提出概率递归意图切换模型以解决逆强化学习中的目标切换问题 manipulation reinforcement learning inverse reinforcement learning
42 Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability 提出基于预训练近似器的低推力轨道燃料消耗与可达性快速评估方法 trajectory optimization

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
43 PIDM-DP: Physics-Informed Diffusion with Dormand-Prince Integration for Chaotic System Identification and State Reconstruction across Multiple Dynamical Regimes 提出PIDM-DP,用于混沌系统识别和跨多动态范围的状态重构 physics-informed diffusion

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
44 Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences 提出stNCE框架,通过时空差异学习能量模型,提升密度估计性能 spatiotemporal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
45 Explainable Comparison of Feature-Based and Deep Learning Models for TROPOMI Methane Plume Screening 对比特征工程与深度学习模型,用于TROPOMI甲烷羽流识别并提供可解释性分析 spatial relationship

⬅️ 返回 cs.LG 首页 · 🏠 返回主页