| 8 |
Learning the Value Systems of Agents with Preference-based and Inverse Reinforcement Learning |
提出基于偏好和逆强化学习的智能体价值系统学习方法,解决人机协作中的价值对齐问题。 |
reinforcement learning inverse reinforcement learning |
|
|
| 9 |
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning |
提出WideSeek-R1,通过多智能体强化学习扩展LLM宽度,解决广域信息检索问题。 |
reinforcement learning large language model |
|
|
| 10 |
Dual Mind World Model Inspired Network Digital Twin for Access Scheduling |
提出基于双心智世界模型的数字孪生网络接入调度框架 |
reinforcement learning world model |
|
|
| 11 |
Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning |
Agent-Omit:通过Agentic强化学习训练高效LLM Agent,自适应省略思考和观察 |
reinforcement learning |
✅ |
|
| 12 |
Steering LLMs via Scalable Interactive Oversight |
提出可扩展交互监督框架,解决大语言模型复杂任务中人工指导难题 |
reinforcement learning large language model |
|
|
| 13 |
InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons |
InterPReT:交互式策略重构与训练,助力非专业人士进行有效的模仿学习 |
imitation learning |
|
|