| 1 |
A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners |
提出可解释性实验以提升大语言模型规划能力 |
world model world models large language model |
|
|
| 2 |
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories |
提出睡眠机制以解决长时记忆与自我改进问题 |
reinforcement learning imitation learning distillation |
|
|
| 3 |
A Quantitative Approximation Framework for Flow Distillation in Diffusion Models |
提出定量近似框架以解决扩散模型中的流蒸馏问题 |
distillation multimodal |
|
|
| 4 |
Tool-Aware Optimization with Entropy Guidance for Efficient Agentic Reinforcement Learning |
提出TAO-RL框架以解决工具使用导致的强化学习不稳定问题 |
reinforcement learning large language model |
|
|
| 5 |
Exploiting Verification-Generation Gap: Test-Time Reinforcement Learning with Confidence-Conditioned Verification |
提出TTRL-CoCoV以解决标签无关强化学习中的Pass@k优化问题 |
reinforcement learning large language model |
✅ |
|
| 6 |
Physics-Guided Policy Optimization with Self-Distillation |
提出物理引导的策略优化方法以解决自蒸馏训练的不稳定性问题 |
distillation privileged information |
|
|
| 7 |
Post-Hoc Robustness for Model-Based Reinforcement Learning |
提出后处理稳健性方法以增强基于模型的强化学习 |
reinforcement learning model-based RL |
|
|
| 8 |
Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning |
通过奖励不确定性引导多样化行为以解决强化学习问题 |
reinforcement learning |
|
|
| 9 |
Dynamic Short Convolutions Improve Transformers |
提出动态短卷积以提升Transformer性能 |
Mamba large language model |
|
|
| 10 |
Easy-to-Use Shielding for Reinforcement Learning |
提出易用的屏蔽技术以解决强化学习中的安全探索问题 |
reinforcement learning |
|
|
| 11 |
Multi$^2$: Hierarchical Multi-Agent Decision-Making with LLM-Based Agents in Interactive Environments |
提出Multi$^2$框架以解决长时决策中的目标漂移问题 |
reinforcement learning large language model |
|
|
| 12 |
Mitigating False Credit Propagation: Probabilistic Graphical Reward Aggregation for Rubric-Based Reinforcement Learning |
提出图形事件聚合方法以解决虚假信用传播问题 |
reinforcement learning |
✅ |
|
| 13 |
Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions |
提出高斯重塑信任区域优化以解决PPO在非平稳环境中的不足 |
reinforcement learning PPO |
|
|