cs.LG（2026-06-01）

📊 共 33 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (15 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (15 🔗2) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
1	HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression	提出HMPO，通过混合中值策略优化实现CoT压缩，降低推理开销。	reinforcement learning large language model instruction following
2	Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design	综述晶体材料逆向设计中生成模型、多模态学习和闭环工作流程的最新进展。	reinforcement learning latent optimization multimodal
3	Policy and World Modeling Co-Training for Language Agents	提出PaW框架，通过策略与世界建模的协同训练提升语言智能体的性能	reinforcement learning world model world models
4	IMWM: Intuition Models Complement World Models for Latent Planning	IMWM：结合直觉模型与世界模型进行潜在空间规划，提升像素级控制任务性能	world model world models
5	OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents	OpenWebRL：探索视觉Web Agent在线多轮强化学习，刷新开源SOTA	reinforcement learning multimodal
6	Uncertainty-Aware Graph Neural Reconstruction of Urban Temperature Fields from Sparse Sensors under Deployment Constraints	提出不确定性感知图神经网络，用于城市稀疏传感器下的温度场重建，并考虑部署约束。	MAE sparse sensors
7	TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks	TabPrep：弥合表格基准测试中特征工程的差距，提升模型性能。	world model world models foundation model	✅
8	Task-Induced Representational Invariances Depend on Learning Objective in Deep RL	深度强化学习中任务诱导的表征不变性依赖于学习目标	reinforcement learning PPO OMOMO
9	Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation	揭示DMD学生模型“抄袭”现象：高维蒸馏中几何自由度受限导致	distillation
10	On the Generalization in Topology Optimization via Sensitivity-Conditioned Bernoulli Flow Matching	提出基于敏感度条件的伯努利流匹配方法，提升拓扑优化中模型的泛化能力	flow matching	✅
11	A Theoretical Framework for Self-Play Theorem Proving Algorithms	提出自博弈定理证明算法的理论框架，解决复杂定理生成问题。	contrastive learning large language model
12	Quantifying the Energy Floor: Direct Measurement and Replay Buffer Bias in SAC-Based HVAC Control on sbsim	量化能源下限：SAC在sbsim上HVAC控制的直接测量与回放缓冲区偏差分析	SAC
13	FedMTFI: Feature Importance Based Optimized Multi Teacher Knowledge Distillation in Heterogeneous Federated Learning Environment	FedMTFI：异构联邦学习中基于特征重要性的优化多教师知识蒸馏	distillation
14	Flexible Online Representation Learning Based on Similarity Matching	提出基于相似性匹配的灵活在线表示学习算法，适用于聚类、流形平铺和稀疏编码。	representation learning
15	VLBM: Variational Latent Basis Modeling for OOD Robust Multivariate Time Series Forecasting	提出VLBM以解决多变量时间序列预测中的OOD鲁棒性问题	MAE PULSE	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation	审计金融大语言模型中特定资产偏好：来自比特币表征与投资组合分配的证据	large language model
17	When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes	提出一种跨模态迁移的表格型基础模型，适用于多种信号分类任务。	foundation model
18	The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing	揭示LLM生成的虚构人物关联先验及其在网络和学术出版中的影响	large language model TAMP
19	Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging	提出MERIT：一种去中心化的指令微调方法，通过冲突感知的数据分割和权重合并提升模型性能。	large language model multimodal	✅
20	ATLAS: Agentic Test-time Learning-to-Allocate Scaling	提出ATLAS框架以优化大语言模型的推理过程	large language model multimodal
21	On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters	探索PEFT的扩展性：迈向千亿参数模型的百万个性化版本	foundation model
22	Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment	AdvCL：利用对抗扰动进行持续学习，从防御到主动对齐	large language model
23	Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization	提出INSERTQUANT，通过向量模板恢复机制实现LLM的spike-free量化	large language model
24	Flow-Transformed Implicit Processes for Function-Space Variational Inference	提出Flow-Transformed Implicit Processes，用于函数空间变分推断，提升后验分布表达能力。	multimodal
25	FLARE: Diffusion for Hybrid Language Model	FLARE：用于混合语言模型的扩散框架，加速并行解码并保持性能。	large language model
26	Shortcut to Nowhere: Demystifying Deep Spurious Regression	针对深度虚假回归，提出利用属性相似性的校准方法，提升泛化能力	large language model
27	DOT-MoE: Differentiable Optimal Transport for MoEfication	提出DOT-MoE，通过可微最优传输实现高效MoE化，提升大模型推理效率。	large language model
28	Estimating Mutual Information between Time Series and Temporal Event Sequences Across Diverse Analysis Tasks	提出一种非参数互信息估计器，用于量化时间序列与事件序列间的依赖关系。	multimodal	✅
29	CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search	CRePE：利用卷积感知相对重要性和高效搜索进行后训练剪枝	large language model
30	Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete	证明无位置编码的滑动窗口Transformer在长文本推理中仍具备图灵完备性	chain-of-thought

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
31	RDA: Reward Design Agent for Reinforcement Learning	提出RDA：基于视觉语言模型的强化学习奖励函数自动设计框架	humanoid manipulation whole-body manipulation	✅
32	Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards	利用学习奖励进行大型行为模型的一致性离策略改进，提升机器人操作性能	manipulation dexterous manipulation reinforcement learning

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
33	BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers	BlockGen：一种使用混合采样器的灵活分块序列建模方法	MDM	✅

⬅️ 返回 cs.LG 首页 · 🏠 返回主页