cs.CL(2025-07-03)
📊 共 21 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (13)
支柱二:RL算法与架构 (RL & Architecture) (7 🔗1)
支柱一:机器人控制 (Robot Control) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Multimodal Mathematical Reasoning with Diverse Solving Perspective | 提出MathV-DP数据集与Qwen-VL-DP模型,提升多模态数学推理能力 | reinforcement learning large language model multimodal | ||
| 15 | Self-Correction Bench: Uncovering and Addressing the Self-Correction Blind Spot in Large Language Models | 揭示并解决大语言模型自我纠错盲点,提升安全关键应用可靠性 | reinforcement learning large language model | ||
| 16 | Generalizing Verifiable Instruction Following | 提出IFBench基准以解决指令跟随泛化问题 | reinforcement learning instruction following | ||
| 17 | RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents | 提出RLVER框架,利用可验证情感奖励提升LLM的共情能力 | reinforcement learning PPO large language model | ||
| 18 | ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization | ARF-RLHF:通过情感驱动的自监督和轨迹偏置动态优化,实现自适应奖励跟随 | PPO RLHF DPO | ||
| 19 | MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs | 提出MOTIF,通过强化学习微调LLM,实现模块化思维以突破上下文长度限制。 | reinforcement learning large language model | ✅ | |
| 20 | Rewrite-to-Rank: Optimizing Ad Visibility via Retrieval-Aware Text Rewriting | 提出Rewrite-to-Rank框架,通过重写广告文本优化其在检索系统中的可见性。 | reinforcement learning PPO |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Adversarial Manipulation of Reasoning Models using Internal Representations | 利用内部表征对抗操纵推理模型,发现并利用“谨慎”方向进行越狱攻击 | manipulation chain-of-thought | ✅ |