cs.CL(2026-02-09)
📊 共 26 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (17 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (8 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (17 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Improving Data and Reward Design for Scientific Reasoning in Large Language Models | 提出Dr. SCI框架,提升大语言模型在开放式科学推理任务上的性能 | reinforcement learning reward design large language model | ||
| 19 | VocalNet-MDM: Accelerating Streaming Speech LLM via Self-Distilled Masked Diffusion Modeling | VocalNet-MDM:通过自蒸馏掩码扩散模型加速流式语音LLM | distillation MDM large language model | ||
| 20 | Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning | 提出基于强化学习的动态长文本推理框架,解决长文本处理中的效率和信息遗忘问题。 | reinforcement learning large language model | ||
| 21 | Beyond Scalar Scores: Reinforcement Learning for Error-Aware Quality Estimation of Machine Translation | 提出ALOPE-RL框架,利用强化学习和错误感知奖励提升机器翻译质量估计 | reinforcement learning large language model | ||
| 22 | Document Reconstruction Unlocks Scalable Long-Context RLVR | 提出基于文档重构的无监督RLVR方法,提升LLM长文本处理能力 | reinforcement learning reward design large language model | ||
| 23 | WildReward: Learning Reward Models from In-the-Wild Human Interactions | WildReward:从真实用户交互中学习奖励模型,提升LLM性能。 | DPO large language model | ✅ | |
| 24 | GISA: A Benchmark for General Information-Seeking Assistant | GISA:通用信息搜索助手基准测试,解决现有基准测试不自然和数据污染问题。 | imitation learning large language model | ||
| 25 | New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR | 提出概率框架以解释RLVR中推理能力的出现 | reinforcement learning large language model |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | ViGoEmotions: A Benchmark Dataset For Fine-grained Emotion Detection on Vietnamese Texts | 提出ViGoEmotions越南语细粒度情感检测数据集,并评估多种预训练模型。 | motion prediction large language model |