| 1 |
Stable and Efficient Single-Rollout RL for Multimodal Reasoning |
提出MSSR,稳定高效地进行多模态大语言模型的单次rollout强化学习推理。 |
reinforcement learning large language model multimodal |
|
|
| 2 |
Conscious Data Contribution via Community-Driven Chain-of-Thought Distillation |
提出基于社区驱动的思维链蒸馏方法,提升用户数据自主性。 |
distillation chain-of-thought |
|
|
| 3 |
Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems |
提出可信赖且可解释的深度强化学习方法,用于安全节能的工业压缩空气系统控制 |
reinforcement learning deep reinforcement learning physically plausible |
|
|
| 4 |
Emotion-Inspired Learning Signals (EILS): A Homeostatic Framework for Adaptive Autonomous Agents |
提出情感启发学习信号(EILS)框架,提升自主智能体在非平稳环境中的适应性。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 5 |
On the Universality of Transformer Architectures; How Much Attention Is Enough? |
综述Transformer架构的通用性,探讨Attention机制的充分性 |
reinforcement learning large language model |
|
|
| 6 |
Learning Tennis Strategy Through Curriculum-Based Dueling Double Deep Q-Networks |
提出基于课程学习的Dueling Double DQN强化学习框架,解决网球策略优化问题 |
reinforcement learning curriculum learning reward design |
|
|
| 7 |
Embedded Safety-Aligned Intelligence via Differentiable Internal Alignment Embeddings |
提出嵌入式安全对齐智能框架,通过可微内部对齐嵌入解决多智能体强化学习中的安全对齐问题 |
reinforcement learning reward shaping |
|
|