| 1 |
On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs |
揭示RL微调视觉语言模型在推理一致性与鲁棒性上的脆弱性,并提出改进方向。 |
reinforcement learning large language model multimodal |
|
|
| 2 |
Constraint-Rectified Training for Efficient Chain-of-Thought |
提出约束校正训练(CRT),提升思维链(CoT)推理效率并控制推理长度。 |
reinforcement learning reward design large language model |
|
|
| 3 |
Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models |
提出Amortized Reasoning Tree Search (ARTS),解耦大语言模型中的提议与决策过程。 |
reinforcement learning flow matching large language model |
|
|
| 4 |
Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models |
Flow-Factory:统一强化学习框架,加速Flow-Matching模型与人类偏好对齐 |
reinforcement learning flow matching |
✅ |
|
| 5 |
Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching |
提出反应中心引导的离散流匹配方法RetroDiT,用于结构感知的逆合成生成。 |
flow matching foundation model |
|
|
| 6 |
Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings |
提出基于联合状态-动作学习嵌入的多智能体模型强化学习框架,提升协作效率。 |
reinforcement learning world model representation learning |
|
|
| 7 |
X-VORTEX: Spatio-Temporal Contrastive Learning for Wake Vortex Trajectory Forecasting |
X-VORTEX:时空对比学习用于尾流涡旋轨迹预测 |
contrastive learning |
|
|
| 8 |
SLA2: Sparse-Linear Attention with Learnable Routing and QAT |
SLA2:结合可学习路由与量化感知训练的稀疏线性注意力,加速视频扩散模型。 |
linear attention |
|
|
| 9 |
Look Inward to Explore Outward: Learning Temperature Policy from LLM Internal States via Hierarchical RL |
提出基于分层强化学习的Introspective LLM,从LLM内部状态学习温度策略 |
reinforcement learning large language model |
|
|
| 10 |
Flow Matching from Viewpoint of Proximal Operators |
基于近端算子的视角重构条件流匹配,提升生成模型性能 |
flow matching |
|
|
| 11 |
VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction |
VI-CuRL:通过置信度引导的方差缩减稳定无验证器强化学习推理 |
reinforcement learning large language model |
|
|
| 12 |
FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching |
FLAC:通过动能正则化桥匹配实现最大熵强化学习 |
reinforcement learning flow matching |
|
|