cs.LG(2026-02-10)

📊 共 27 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (14) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗1) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
1 Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning 提出乐观世界模型(OWMs),通过奖励偏置最大似然估计实现高效探索 reinforcement learning deep reinforcement learning world model
2 Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization 提出FGO算法以解决长链推理压缩问题 reinforcement learning large language model chain-of-thought
3 Towards Uniformity and Alignment for Multimodal Representation Learning 提出解耦对齐与均匀性的多模态表征学习方法,缓解模态间分布差异。 representation learning multimodal
4 Diffusion-Guided Pretraining for Brain Graph Foundation Models 提出扩散引导的脑图预训练框架,提升脑连接组表征学习的鲁棒性。 masked autoencoder foundation model
5 Answer First, Reason Later: Aligning Search Relevance via Mode-Balanced Reinforcement Learning 提出AFRL范式和Mode-Balanced RL,解决搜索排序中低延迟与高性能的平衡问题。 reinforcement learning distillation large language model
6 ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm ExO-PPO:一种扩展的Off-policy近端策略优化算法,提升样本效率和稳定性。 reinforcement learning deep reinforcement learning PPO
7 ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning ADORA:通过动态优势估计训练强化学习推理模型,提升几何和数学任务性能。 reinforcement learning
8 Flexible Entropy Control in RLVR with Gradient-Preserving Perspective 提出基于梯度保持视角的可变熵控制方法,解决RLVR中策略熵坍塌问题 reinforcement learning large language model
9 Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning FlexMARL:面向大规模LLM多智能体强化学习的高效Rollout-Training协同设计框架 reinforcement learning
10 Beyond Student: An Asymmetric Network for Neural Network Inheritance 提出InherNet,通过非对称低秩分解实现神经网络的结构与知识继承,超越知识蒸馏。 distillation multimodal
11 Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning 针对流式强化学习,提出在线自预测表征学习方法,提升样本效率。 reinforcement learning
12 Latent Poincaré Shaping for Agentic Reinforcement Learning LaPha:在庞加莱潜在空间中训练类AlphaZero的LLM智能体,提升数学问题求解能力。 reinforcement learning
13 Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability 提出RLFR框架,利用特征作为奖励,提升开放任务中语言模型的真实性。 reinforcement learning affordance
14 A Controlled Study of Double DQN and Dueling DQN Under Cross-Environment Transfer 对比DDQN与Dueling DQN在跨环境迁移中的表现差异 reinforcement learning deep reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
15 Large Language Models for Designing Participatory Budgeting Rules 提出LLMRule框架,利用大语言模型设计更优的参与式预算规则。 large language model
16 Biases in the Blind Spot: Detecting What LLMs Fail to Mention 提出一种全自动黑盒方法,用于检测大语言模型中未表达的任务特定偏见。 large language model chain-of-thought
17 A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula 提出基于任务中心的迭代自提升理论,利用由易到难课程学习提升LLM性能。 large language model
18 CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization CoFEH:基于协同贝叶斯超参数优化的LLM驱动特征工程 large language model
19 Towards Poisoning Robustness Certification for Natural Language Generation 提出TPA算法,为自然语言生成提供可验证的投毒鲁棒性保证 foundation model
20 LLM-FS: Zero-Shot Feature Selection for Effective and Interpretable Malware Detection 提出LLM-FS:一种零样本特征选择方法,用于有效且可解释的恶意软件检测。 large language model
21 Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA 揭示LoRA微调中Batch Size的重要性,提出高效Batch Size调优策略 large language model
22 Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning 提出SL-SAM:一种稀疏层感知的锐度最小化方法,用于高效微调。 large language model
23 MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection 提出MacrOData,一个包含数千表格数据集的大规模异常检测基准测试集。 foundation model
24 Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density 提出RFID-MoE,通过异构专家路由频率和信息密度实现高效MoE LLM压缩 large language model

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
25 Physics-informed diffusion models in spectral space 提出基于谱空间的物理信息扩散模型,用于求解参数化偏微分方程。 physics-informed diffusion
26 Scalable and Reliable State-Aware Inference of High-Impact N-k Contingencies 提出一种可扩展的状态感知推理框架,用于高效评估高影响N-k故障 penetration

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
27 Mitigating the Likelihood Paradox in Flow-based OOD Detection via Entropy Manipulation 提出基于熵操控的Flow模型OOD检测方法,缓解似然悖论 manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页