cs.LG(2025-05-26)
📊 共 23 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (14 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO | 理论分析RLHF与DPO在偏好学习中的性能差距,揭示模型偏差影响 | reinforcement learning preference learning RLHF | ||
| 2 | Alignment of large language models with constrained learning | 提出基于拉格朗日对偶的LLM对齐方法,解决约束条件下奖励最大化问题 | RLHF large language model | ||
| 3 | Learning a Pessimistic Reward Model in RLHF | 提出PET悲观奖励模型微调方法,提升RLHF中奖励模型的鲁棒性,抵抗奖励篡改。 | reinforcement learning offline reinforcement learning RLHF | ||
| 4 | Rotary Masked Autoencoders are Versatile Learners | 提出Rotary Masked Autoencoder (RoMAE),用于处理包含连续位置信息的多模态数据。 | representation learning masked autoencoder MAE | ||
| 5 | JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning | 提出JEDI:通过端到端隐空间扩散模型缓解基于模型的强化学习中Agent-Human性能不对称问题 | reinforcement learning world model | ||
| 6 | The Limits of Preference Data for Post-Training | 揭示偏好数据在后训练中优化复杂任务的局限性 | reinforcement learning RLHF large language model | ||
| 7 | An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning | 提出基于强化学习优化LLM推理的神经退行性痴呆可解释诊断框架 | reinforcement learning distillation large language model | ||
| 8 | DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning | DISCOVER:面向稀疏奖励强化学习的自动化课程学习方法 | reinforcement learning | ||
| 9 | Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees | 提出一种等变表示学习框架,用于具备保证的对称感知推理。 | representation learning | ||
| 10 | Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning | 提出基于强化学习的文本到多视角扩散模型优化框架,提升图像质量和视角一致性 | reinforcement learning | ||
| 11 | The challenge of hidden gifts in multi-agent reinforcement learning | 针对多智能体强化学习中“隐藏礼物”问题,提出自学习感知修正策略梯度方法 | reinforcement learning | ||
| 12 | Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning | 提出基于贝叶斯自适应强化学习的BARL算法,提升LLM推理中的反思性探索能力 | reinforcement learning large language model | ✅ | |
| 13 | Characterizing Pattern Matching and Its Limits on Compositional Task Structures | 提出模式匹配的形式化框架以解决组合任务中的泛化问题 | Mamba chain-of-thought | ||
| 14 | ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining | 提出ESLM:一种风险规避的选择性语言模型预训练方法,提升效率和鲁棒性。 | distillation large language model |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | ETS: Open Vocabulary Electroencephalography-To-Text Decoding and Sentiment Classification | ETS:结合脑电与眼动数据的开放词汇脑电文本解码与情感分类框架 | open-vocabulary open vocabulary |