cs.LG(2026-02-22)
📊 共 13 篇论文
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (7)
支柱九:具身大模型 (Embodied Foundation Models) (5)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Stable Deep Reinforcement Learning via Isotropic Gaussian Representations | 提出基于各向同性高斯表示的稳定深度强化学习方法,提升非平稳环境下的性能。 | reinforcement learning deep reinforcement learning | ||
| 2 | AdsorbFlow: energy-conditioned flow matching enables fast and realistic adsorbate placement | AdsorbFlow:能量条件流匹配实现快速逼真的吸附质放置 | flow matching classifier-free guidance | ||
| 3 | LLMs Can Learn to Reason Via Off-Policy RL | 提出OAPL算法,解决LLM离策略强化学习中训练与推理策略差异问题。 | reinforcement learning PPO large language model | ||
| 4 | Soft Sequence Policy Optimization: Bridging GMPO and SAPO | 提出软序列策略优化以解决策略训练稳定性问题 | reinforcement learning PPO large language model | ||
| 5 | How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization | DynaMO:针对LLM推理,优化Rollout分配与优势调制的策略优化框架 | reinforcement learning large language model | ||
| 6 | Pushing the Limits of Inverse Lithography with Generative Reinforcement Learning | 提出基于生成强化学习的反向光刻方法,突破传统ILT的局部最优限制。 | reinforcement learning | ||
| 7 | Learning to Detect Language Model Training Data via Active Reconstruction | 提出主动数据重构攻击以解决LLM训练数据检测问题 | reinforcement learning distillation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | TimeRadar: A Domain-Rotatable Foundation Model for Time Series Anomaly Detection | TimeRadar:一种用于时间序列异常检测的域可旋转基础模型 | foundation model | ||
| 9 | Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning | 提出PROSPER算法,解决多目标偏好微调中传递性缺失问题。 | large language model instruction following | ||
| 10 | Smooth Gate Functions for Soft Advantage Policy Optimization | 提出平滑门函数优化Soft Advantage Policy Optimization,提升LLM数学推理能力 | large language model | ||
| 11 | Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations | 揭示语言模型程序性幻觉:注意力缺陷导致推理后结果遗忘 | large language model | ||
| 12 | Understanding Empirical Unlearning with Combinatorial Interpretability | 利用组合可解释性理解经验性模型遗忘中的知识残留问题 | foundation model |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | An Interpretable Data-Driven Model of the Flight Dynamics of Hawks | 提出基于动态模态分解的鹰类飞行动力学可解释数据驱动模型 | locomotion |