cs.LG(2026-06-01)

📊 共 33 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (15 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (15 🔗2) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (15 篇)

#题目一句话要点标签🔗
1 HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression 提出HMPO,通过混合中值策略优化实现CoT压缩,降低推理开销。 reinforcement learning large language model instruction following
2 Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design 综述晶体材料逆向设计中生成模型、多模态学习和闭环工作流程的最新进展。 reinforcement learning latent optimization multimodal
3 Policy and World Modeling Co-Training for Language Agents 提出PaW框架,通过策略与世界建模的协同训练提升语言智能体的性能 reinforcement learning world model world models
4 IMWM: Intuition Models Complement World Models for Latent Planning IMWM:结合直觉模型与世界模型进行潜在空间规划,提升像素级控制任务性能 world model world models
5 OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents OpenWebRL:探索视觉Web Agent在线多轮强化学习,刷新开源SOTA reinforcement learning multimodal
6 Uncertainty-Aware Graph Neural Reconstruction of Urban Temperature Fields from Sparse Sensors under Deployment Constraints 提出不确定性感知图神经网络,用于城市稀疏传感器下的温度场重建,并考虑部署约束。 MAE sparse sensors
7 TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks TabPrep:弥合表格基准测试中特征工程的差距,提升模型性能。 world model world models foundation model
8 Task-Induced Representational Invariances Depend on Learning Objective in Deep RL 深度强化学习中任务诱导的表征不变性依赖于学习目标 reinforcement learning PPO OMOMO
9 Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation 揭示DMD学生模型“抄袭”现象:高维蒸馏中几何自由度受限导致 distillation
10 On the Generalization in Topology Optimization via Sensitivity-Conditioned Bernoulli Flow Matching 提出基于敏感度条件的伯努利流匹配方法,提升拓扑优化中模型的泛化能力 flow matching
11 A Theoretical Framework for Self-Play Theorem Proving Algorithms 提出自博弈定理证明算法的理论框架,解决复杂定理生成问题。 contrastive learning large language model
12 Quantifying the Energy Floor: Direct Measurement and Replay Buffer Bias in SAC-Based HVAC Control on sbsim 量化能源下限:SAC在sbsim上HVAC控制的直接测量与回放缓冲区偏差分析 SAC
13 FedMTFI: Feature Importance Based Optimized Multi Teacher Knowledge Distillation in Heterogeneous Federated Learning Environment FedMTFI:异构联邦学习中基于特征重要性的优化多教师知识蒸馏 distillation
14 Flexible Online Representation Learning Based on Similarity Matching 提出基于相似性匹配的灵活在线表示学习算法,适用于聚类、流形平铺和稀疏编码。 representation learning
15 VLBM: Variational Latent Basis Modeling for OOD Robust Multivariate Time Series Forecasting 提出VLBM以解决多变量时间序列预测中的OOD鲁棒性问题 MAE PULSE

🔬 支柱九:具身大模型 (Embodied Foundation Models) (15 篇)

#题目一句话要点标签🔗
16 Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation 审计金融大语言模型中特定资产偏好:来自比特币表征与投资组合分配的证据 large language model
17 When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes 提出一种跨模态迁移的表格型基础模型,适用于多种信号分类任务。 foundation model
18 The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing 揭示LLM生成的虚构人物关联先验及其在网络和学术出版中的影响 large language model TAMP
19 Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging 提出MERIT:一种去中心化的指令微调方法,通过冲突感知的数据分割和权重合并提升模型性能。 large language model multimodal
20 ATLAS: Agentic Test-time Learning-to-Allocate Scaling 提出ATLAS框架以优化大语言模型的推理过程 large language model multimodal
21 On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters 探索PEFT的扩展性:迈向千亿参数模型的百万个性化版本 foundation model
22 Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment AdvCL:利用对抗扰动进行持续学习,从防御到主动对齐 large language model
23 Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization 提出INSERTQUANT,通过向量模板恢复机制实现LLM的spike-free量化 large language model
24 Flow-Transformed Implicit Processes for Function-Space Variational Inference 提出Flow-Transformed Implicit Processes,用于函数空间变分推断,提升后验分布表达能力。 multimodal
25 FLARE: Diffusion for Hybrid Language Model FLARE:用于混合语言模型的扩散框架,加速并行解码并保持性能。 large language model
26 Shortcut to Nowhere: Demystifying Deep Spurious Regression 针对深度虚假回归,提出利用属性相似性的校准方法,提升泛化能力 large language model
27 DOT-MoE: Differentiable Optimal Transport for MoEfication 提出DOT-MoE,通过可微最优传输实现高效MoE化,提升大模型推理效率。 large language model
28 Estimating Mutual Information between Time Series and Temporal Event Sequences Across Diverse Analysis Tasks 提出一种非参数互信息估计器,用于量化时间序列与事件序列间的依赖关系。 multimodal
29 CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search CRePE:利用卷积感知相对重要性和高效搜索进行后训练剪枝 large language model
30 Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete 证明无位置编码的滑动窗口Transformer在长文本推理中仍具备图灵完备性 chain-of-thought

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
31 RDA: Reward Design Agent for Reinforcement Learning 提出RDA:基于视觉语言模型的强化学习奖励函数自动设计框架 humanoid manipulation whole-body manipulation
32 Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards 利用学习奖励进行大型行为模型的一致性离策略改进,提升机器人操作性能 manipulation dexterous manipulation reinforcement learning

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
33 BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers BlockGen:一种使用混合采样器的灵活分块序列建模方法 MDM

⬅️ 返回 cs.LG 首页 · 🏠 返回主页