| 1 |
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better |
提出知识隔离的VLA模型,加速训练、推理并提升泛化能力 |
flow matching vision-language-action VLA |
|
|
| 2 |
Differential Information Distribution: A Bayesian Perspective on Direct Preference Optimization |
提出差异信息分布以优化直接偏好学习 |
DPO direct preference optimization reward design |
|
|
| 3 |
Composite Reward Design in PPO-Driven Adaptive Filtering |
提出基于PPO的复合奖励自适应滤波框架以解决动态环境中的去噪问题 |
reinforcement learning PPO reward design |
|
|
| 4 |
Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation |
提出一种量化映射结构的方法,用于理解深度学习模型的表征、泛化能力和设计决策的影响。 |
reinforcement learning large language model |
|
|