cs.LG(2024-06-27)
📊 共 19 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | OmniJARVIS:统一视觉-语言-动作 Token 化实现开放世界指令跟随智能体 | imitation learning vision-language-action VLA | ||
| 12 | From Efficient Multimodal Models to World Models: A Survey | 综述多模态大模型:迈向通用人工智能与世界模型的关键技术与挑战 | world model large language model multimodal | ||
| 13 | Curriculum Learning with Quality-Driven Data Selection | 提出基于质量驱动数据选择的课程学习方法,提升多模态大语言模型性能 | curriculum learning large language model multimodal | ||
| 14 | Efficient World Models with Context-Aware Tokenization | 提出Δ-IRIS,通过上下文感知 Tokenization 实现高效世界模型,刷新 Crafter 基准。 | reinforcement learning deep reinforcement learning world model | ✅ | |
| 15 | Averaging log-likelihoods in direct alignment | 提出一种长度不变的直接对齐方法,优化LLM与人类判断的一致性。 | reinforcement learning RLHF large language model | ||
| 16 | Instance Temperature Knowledge Distillation | 提出基于强化学习的实例温度知识蒸馏方法,提升学生网络性能。 | reinforcement learning distillation | ✅ | |
| 17 | Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers | GCFormer:利用对比学习增强Token化图Transformer中的节点表示,提升节点分类性能。 | contrastive learning | ||
| 18 | Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion | 提出对比策略梯度(CoPG),用于在序列级奖励下对齐LLM,且兼容监督学习。 | reinforcement learning large language model | ||
| 19 | Decoding-Time Language Model Alignment with Multiple Objectives | 提出多目标解码(MOD)算法,用于解码时对齐语言模型以优化多个目标。 | PPO DPO |