cs.LG(2024-10-18)
📊 共 22 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (12 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (9 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning | 提出CARD框架,通过动态反馈的LLM驱动奖励函数设计,提升强化学习性能。 | reinforcement learning reward design large language model | ||
| 14 | DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents | 提出DistRL,用于设备端控制代理的异步分布式强化学习框架,提升训练效率。 | reinforcement learning large language model multimodal | ||
| 15 | Inverse Reinforcement Learning from Non-Stationary Learning Agents | 提出基于Bundle Behavior Cloning的逆强化学习方法,解决非稳态学习Agent的奖励函数学习问题。 | reinforcement learning behavior cloning inverse reinforcement learning | ||
| 16 | Streaming Deep Reinforcement Learning Finally Works | 提出Stream-x算法,克服深度强化学习流式学习障碍,实现高效稳定学习 | reinforcement learning deep reinforcement learning | ||
| 17 | How to Evaluate Reward Models for RLHF | 提出Preference Proxy Evaluations (PPE),用于高效评估RLHF奖励模型。 | reinforcement learning RLHF predictive model | ✅ | |
| 18 | Self-supervised contrastive learning performs non-linear system identification | 提出动态对比学习,通过自监督学习进行非线性系统辨识。 | representation learning contrastive learning | ||
| 19 | Online Reinforcement Learning with Passive Memory | 提出利用被动记忆的在线强化学习算法,提升性能并保证近最优遗憾。 | reinforcement learning | ||
| 20 | Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification | 提出ClusterText,通过聚类优化的负采样解决图对比学习中的过聚类问题,提升半监督文本分类性能 | contrastive learning | ||
| 21 | Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping | 提出基于子目标映射的迁移强化学习方法,解决异构动作空间下的策略迁移问题。 | reinforcement learning |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models | PLMTrajRec:一种基于预训练语言模型的可扩展通用轨迹恢复方法 | spatiotemporal |