| 1 |
DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models |
DeepThinkVLA通过混合注意力机制和双阶段训练提升VLA模型推理能力 |
reinforcement learning vision-language-action VLA |
|
|
| 2 |
Iterative Foundation Model Fine-Tuning on Multiple Rewards |
提出基于多重奖励的迭代式基础模型微调方法,提升生成任务性能 |
reinforcement learning foundation model |
|
|
| 3 |
A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control |
提出HeraldLight,一种双LLM架构,用于并行细粒度交通信号控制,显著降低平均通行时间和排队长度。 |
reinforcement learning large language model |
✅ |
|
| 4 |
MVeLMA: Multimodal Vegetation Loss Modeling Architecture for Predicting Post-fire Vegetation Loss |
MVeLMA:多模态植被损失建模架构,用于预测火灾后植被损失 |
predictive model multimodal |
|
|
| 5 |
MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data |
MedM2T:一种用于电子病历和心电图数据的时间感知多模态建模框架 |
MAE multimodal |
✅ |
|
| 6 |
When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making |
利用强化学习的市场做市商对抗中频交易者,揭示逆向选择机制 |
reinforcement learning PPO imitation learning |
|
|
| 7 |
Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning |
提出基于深度强化学习的供应链金融决策模型,提升企业经济效益预测精度。 |
reinforcement learning deep reinforcement learning |
|
|
| 8 |
Higher-order Linear Attention |
提出高阶线性注意力机制,解决自回归语言模型长文本处理的二次复杂度问题 |
SSM state space model linear attention |
✅ |
|
| 9 |
Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning |
提出领域知识驱动的强化学习方法,提升混沌对流控制的鲁棒性 |
reinforcement learning reward design |
|
|
| 10 |
LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers |
LC-Opt:数据中心液冷优化基准,利用强化学习和Agentic AI实现端到端控制。 |
reinforcement learning distillation |
|
|
| 11 |
Soft Task-Aware Routing of Experts for Equivariant Representation Learning |
提出软任务感知路由专家(STAR),提升等变表征学习效率。 |
representation learning |
✅ |
|
| 12 |
Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems |
研究开放多智能体系统中信用分配难题,揭示开放性对性能的影响 |
reinforcement learning |
|
|
| 13 |
Simplex-to-Euclidean Bijections for Categorical Flow Matching |
提出基于单纯形-欧几里得空间双射的分类流匹配方法,用于学习单纯形上的概率分布。 |
flow matching |
|
|
| 14 |
Reasoning Models Sometimes Output Illegible Chains of Thought |
强化学习训练的推理模型CoT链条可读性降低,影响意图理解与恶意行为检测。 |
reinforcement learning chain-of-thought |
|
|
| 15 |
Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation |
提出IWD:一种基于影响函数的数据集蒸馏方法,提升模型性能。 |
distillation |
|
|
| 16 |
Towards Understanding Self-play for LLM Reasoning |
分析自博弈训练机制,提升LLM推理能力,揭示其与RLVR和SFT的差异与局限。 |
reinforcement learning large language model |
|
|