| 1 |
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning |
OThink-MR1:通过动态强化学习激发多模态通用推理能力 |
reinforcement learning large language model multimodal |
|
|
| 2 |
Active management of battery degradation in wireless sensor network using deep reinforcement learning for group battery replacement |
提出基于深度强化学习的无线传感器网络电池主动管理方法,实现分组电池更换。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
Advances in Protein Representation Learning: Methods, Applications, and Future Directions |
综述蛋白质表示学习进展,为分子生物学、医学研究和药物发现提供新视角。 |
representation learning multimodal |
|
|
| 4 |
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't |
利用强化学习提升小规模LLM的推理能力,兼顾效果与成本。 |
reinforcement learning large language model |
✅ |
|
| 5 |
Network-wide Freeway Traffic Estimation Using Sparse Sensor Data: A Dirichlet Graph Auto-Encoder Approach |
提出DGAE模型,利用稀疏传感器数据实现全路网交通状态估计,提升跨城市迁移能力。 |
representation learning sparse sensors |
|
|
| 6 |
Utilizing Reinforcement Learning for Bottom-Up part-wise Reconstruction of 2D Wire-Frame Projections |
提出基于强化学习的自底向上零件式二维线框投影重建方法 |
reinforcement learning curriculum learning |
|
|
| 7 |
Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning |
提出非参数贝尔曼映射,用于分布式强化学习中的值迭代。 |
reinforcement learning DRL |
|
|
| 8 |
InCo-DPO: Balancing Distribution Shift and Data Quality for Enhanced Preference Optimization |
InCo-DPO:平衡分布偏移与数据质量,提升偏好优化效果 |
DPO direct preference optimization |
|
|
| 9 |
Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement |
提出基于混合块替换的高效ANN引导SNN蒸馏训练框架 |
distillation |
|
|
| 10 |
Denoising-based Contractive Imitation Learning |
提出基于去噪的收缩模仿学习以解决协变量偏移问题 |
imitation learning |
|
|
| 11 |
Bezier Distillation |
提出Bezier蒸馏方法,结合多教师知识蒸馏与Bezier曲线,解决Rectified Flow中的误差累积问题。 |
distillation |
|
|
| 12 |
Disentangling Uncertainties by Learning Compressed Data Representation |
提出压缩数据表征模型CDRM,用于解耦学习系统动态模型中的不确定性 |
reinforcement learning multimodal |
✅ |
|
| 13 |
Whenever, Wherever: Towards Orchestrating Crowd Simulations with Spatio-Temporal Spawn Dynamics |
提出nTPP-GMM模型,用于人群仿真中时空生成动态的建模与编排。 |
reinforcement learning deep reinforcement learning |
|
|