| 1 |
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better |
提出知识隔离技术以提升视觉-语言-动作模型的训练与推理效率 |
flow matching vision-language-action VLA |
|
|
| 2 |
Differential Information Distribution: A Bayesian Perspective on Direct Preference Optimization |
提出差异信息分布方法以优化直接偏好学习 |
DPO direct preference optimization reward design |
|
|
| 3 |
Composite Reward Design in PPO-Driven Adaptive Filtering |
提出基于PPO的复合奖励设计以解决自适应滤波问题 |
reinforcement learning PPO reward design |
|
|
| 4 |
Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation |
提出定量方法以解析神经网络的表示结构 |
reinforcement learning large language model |
|
|