| 9 |
Structured Latent Dynamics in Wireless CSI via Homomorphic World Models |
提出基于同态世界模型的无线CSI结构化潜在动态学习框架 |
world model latent dynamics scene understanding |
|
|
| 10 |
FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment |
提出FedPDPO,解决联邦学习中大语言模型个性化偏好对齐问题 |
reinforcement learning RLHF DPO |
|
|
| 11 |
What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time |
提出SCRL,通过选择性互补强化学习解决测试时推理中弱共识下的标签噪声问题。 |
reinforcement learning large language model |
✅ |
|
| 12 |
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization |
FIPO:通过未来KL散度影响的策略优化,激发大语言模型的深度推理能力 |
reinforcement learning large language model chain-of-thought |
|
|
| 13 |
DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management |
DeepStock:通过策略正则化强化学习优化库存管理 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 14 |
Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States |
通过重新引入马尔可夫状态突破LLM后训练能力瓶颈 |
reinforcement learning large language model |
|
|
| 15 |
Learning to Bet for Horizon-Aware Anytime-Valid Testing |
提出基于深度强化学习的时限感知测试方法以优化投注策略 |
reinforcement learning deep reinforcement learning |
|
|