cs.LG（2026-02-10）

📊 共 27 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (14) 支柱九：具身大模型 (Embodied Foundation Models) (10 🔗1) 支柱四：生成式动作 (Generative Motion) (2 🔗1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning	提出乐观世界模型(OWMs)，通过奖励偏置最大似然估计实现高效探索	reinforcement learning deep reinforcement learning world model
2	Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization	提出FGO算法以解决长链推理压缩问题	reinforcement learning large language model chain-of-thought
3	Towards Uniformity and Alignment for Multimodal Representation Learning	提出解耦对齐与均匀性的多模态表征学习方法，缓解模态间分布差异。	representation learning multimodal
4	Diffusion-Guided Pretraining for Brain Graph Foundation Models	提出扩散引导的脑图预训练框架，提升脑连接组表征学习的鲁棒性。	masked autoencoder foundation model
5	Answer First, Reason Later: Aligning Search Relevance via Mode-Balanced Reinforcement Learning	提出AFRL范式和Mode-Balanced RL，解决搜索排序中低延迟与高性能的平衡问题。	reinforcement learning distillation large language model
6	ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm	ExO-PPO：一种扩展的Off-policy近端策略优化算法，提升样本效率和稳定性。	reinforcement learning deep reinforcement learning PPO
7	ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning	ADORA：通过动态优势估计训练强化学习推理模型，提升几何和数学任务性能。	reinforcement learning
8	Flexible Entropy Control in RLVR with Gradient-Preserving Perspective	提出基于梯度保持视角的可变熵控制方法，解决RLVR中策略熵坍塌问题	reinforcement learning large language model
9	Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning	FlexMARL：面向大规模LLM多智能体强化学习的高效Rollout-Training协同设计框架	reinforcement learning
10	Beyond Student: An Asymmetric Network for Neural Network Inheritance	提出InherNet，通过非对称低秩分解实现神经网络的结构与知识继承，超越知识蒸馏。	distillation multimodal
11	Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning	针对流式强化学习，提出在线自预测表征学习方法，提升样本效率。	reinforcement learning
12	Latent Poincaré Shaping for Agentic Reinforcement Learning	LaPha：在庞加莱潜在空间中训练类AlphaZero的LLM智能体，提升数学问题求解能力。	reinforcement learning
13	Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability	提出RLFR框架，利用特征作为奖励，提升开放任务中语言模型的真实性。	reinforcement learning affordance
14	A Controlled Study of Double DQN and Dueling DQN Under Cross-Environment Transfer	对比DDQN与Dueling DQN在跨环境迁移中的表现差异	reinforcement learning deep reinforcement learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Large Language Models for Designing Participatory Budgeting Rules	提出LLMRule框架，利用大语言模型设计更优的参与式预算规则。	large language model
16	Biases in the Blind Spot: Detecting What LLMs Fail to Mention	提出一种全自动黑盒方法，用于检测大语言模型中未表达的任务特定偏见。	large language model chain-of-thought
17	A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula	提出基于任务中心的迭代自提升理论，利用由易到难课程学习提升LLM性能。	large language model
18	CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization	CoFEH：基于协同贝叶斯超参数优化的LLM驱动特征工程	large language model
19	Towards Poisoning Robustness Certification for Natural Language Generation	提出TPA算法，为自然语言生成提供可验证的投毒鲁棒性保证	foundation model
20	LLM-FS: Zero-Shot Feature Selection for Effective and Interpretable Malware Detection	提出LLM-FS：一种零样本特征选择方法，用于有效且可解释的恶意软件检测。	large language model
21	Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA	揭示LoRA微调中Batch Size的重要性，提出高效Batch Size调优策略	large language model
22	Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning	提出SL-SAM：一种稀疏层感知的锐度最小化方法，用于高效微调。	large language model
23	MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection	提出MacrOData，一个包含数千表格数据集的大规模异常检测基准测试集。	foundation model	✅
24	Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density	提出RFID-MoE，通过异构专家路由频率和信息密度实现高效MoE LLM压缩	large language model

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
25	Physics-informed diffusion models in spectral space	提出基于谱空间的物理信息扩散模型，用于求解参数化偏微分方程。	physics-informed diffusion	✅
26	Scalable and Reliable State-Aware Inference of High-Impact N-k Contingencies	提出一种可扩展的状态感知推理框架，用于高效评估高影响N-k故障	penetration

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	Mitigating the Likelihood Paradox in Flow-based OOD Detection via Entropy Manipulation	提出基于熵操控的Flow模型OOD检测方法，缓解似然悖论	manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页