| 10 |
HARLF: Hierarchical Reinforcement Learning and Lightweight LLM-Driven Sentiment Integration for Financial Portfolio Optimization |
提出HARLF框架,结合轻量级LLM和分层强化学习优化金融投资组合。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 11 |
Revisiting LLM Reasoning via Information Bottleneck |
提出基于信息瓶颈的LLM推理优化框架,提升数学推理能力 |
reinforcement learning large language model chain-of-thought |
|
|
| 12 |
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law |
提出SafeLadder框架,使SafeWork-R1在安全性和能力上协同进化,显著提升多模态推理模型的安全性。 |
reinforcement learning RLHF multimodal |
|
|
| 13 |
DxHF: Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition |
DxHF:通过交互式分解提供高质量人类反馈,用于LLM对齐 |
reinforcement learning RLHF large language model |
|
|
| 14 |
Simulation-Driven Reinforcement Learning in Queuing Network Routing Optimization |
提出基于仿真驱动的Dyna-DDPG算法,优化排队网络路由决策。 |
reinforcement learning predictive model |
|
|
| 15 |
Optimising Call Centre Operations using Reinforcement Learning: Value Iteration versus Proximal Policy Optimisation |
利用强化学习优化呼叫中心运营:价值迭代与近端策略优化对比 |
reinforcement learning PPO |
|
|