cs.AI(2026-02-27)
📊 共 27 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models | 提出EMO-R3框架,提升多模态大语言模型在视觉情感理解中的推理能力。 | reinforcement learning large language model multimodal | ||
| 18 | Pessimistic Auxiliary Policy for Offline Reinforcement Learning | 提出悲观辅助策略,解决离线强化学习中的过估计问题 | reinforcement learning offline RL offline reinforcement learning | ||
| 19 | DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science | DARE-bench:评估LLM在数据科学中建模和指令遵循的基准 | reinforcement learning large language model instruction following | ||
| 20 | ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation | 提出ProductResearch框架,通过多智能体合成轨迹蒸馏训练电商深度研究Agent | distillation large language model | ||
| 21 | RF-Agent: Automated Reward Function Design via Language Agent Tree Search | 提出RF-Agent,利用语言代理树搜索自动设计强化学习奖励函数 | reward design large language model | ✅ | |
| 22 | Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem | 提出基于强化学习的构造-合并-求解-适应算法RL-CMSA,解决最小-最大多旅行商问题。 | reinforcement learning | ||
| 23 | Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints | 提出基于异构图网络的DRL方法,解决有限缓冲和物料配套约束下的柔性作业车间调度问题 | reinforcement learning deep reinforcement learning DRL | ||
| 24 | Portfolio Reinforcement Learning with Scenario-Context Rollout | 提出宏观条件情景上下文展开的强化学习方法,提升投资组合在市场剧变下的鲁棒性。 | reinforcement learning | ||
| 25 | The Auton Agentic AI Framework | Auton:用于自主Agent系统构建、执行和治理的通用AI框架 | reinforcement learning large language model | ||
| 26 | Green or Fast? Learning to Balance Cold Starts and Idle Carbon in Serverless Computing | LACE-RL:基于强化学习的Serverless冷启动与碳排放平衡管理框架 | reinforcement learning deep reinforcement learning | ||
| 27 | RUMAD: Reinforcement-Unifying Multi-Agent Debate | RUMAD:提出基于强化学习的多智能体辩论框架,提升效率与泛化性 | reinforcement learning PPO |