| 1 |
A multimodal vision foundation model for generalizable knee pathology |
OrthoFoundation:用于膝关节病理泛化的多模态视觉基础模型 |
contrastive learning foundation model multimodal |
|
|
| 2 |
Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting |
Splat-Portrait:基于高斯溅射的通用说话人头部生成方法 |
distillation gaussian splatting splatting |
✅ |
|
| 3 |
GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning |
提出GenAgent以解决多模态生成与理解的高成本问题 |
reinforcement learning multimodal |
✅ |
|
| 4 |
HomoFM: Deep Homography Estimation with Flow Matching |
HomoFM:利用流匹配的深度单应性估计,提升精度与鲁棒性 |
flow matching multimodal |
✅ |
|
| 5 |
QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding |
提出QualiRAG,一种免训练的检索增强生成框架,用于视觉质量理解。 |
reinforcement learning spatiotemporal multimodal |
✅ |
|
| 6 |
Low Cost, High Efficiency: LiDAR Place Recognition in Vineyards with Matryoshka Representation Learning |
提出MinkUNeXt-VINE,利用Matryoshka表征学习实现低成本LiDAR在葡萄园中的高效定位 |
representation learning |
|
|
| 7 |
\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation |
NaVIDA:通过逆动力学增强的视觉-语言导航框架,提升导航稳定性和泛化性。 |
policy learning VLN |
|
|