| 1 |
VOCAL: Visual Odometry via ContrAstive Learning |
提出VOCAL框架以解决视觉里程计的可解释性问题 |
representation learning contrastive learning visual odometry |
|
|
| 2 |
Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking |
提出Mamba-FETrack V2以解决多模态视觉目标跟踪效率问题 |
Mamba state space model multimodal |
✅ |
|
| 3 |
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching |
提出JAM-Flow以解决音频与面部动作合成问题 |
flow matching motion synthesis |
|
|
| 4 |
Embedding-based Retrieval in Multimodal Content Moderation |
提出嵌入式检索方法以解决短视频内容审核效率问题 |
contrastive learning multimodal |
|
|
| 5 |
Towards foundational LiDAR world models with efficient latent flow matching |
提出基于潜在条件流匹配的LiDAR世界模型以解决领域迁移问题 |
flow matching world model |
|
|
| 6 |
Dataset Distillation via Vision-Language Category Prototype |
提出视觉-语言类别原型的蒸馏方法以提升数据集蒸馏性能 |
distillation large language model |
✅ |
|
| 7 |
LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching |
提出LLM增强的动作感知多模态提示调优以解决图像-文本匹配问题 |
representation learning spatial relationship large language model |
|
|
| 8 |
NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments |
提出NavMorph以解决视觉语言导航中的环境适应问题 |
world model VLN |
✅ |
|
| 9 |
CS-VLM: Compressed Sensing Attention for Efficient Vision-Language Representation Learning |
提出压缩感知注意力机制以解决视觉语言模型的计算瓶颈问题 |
representation learning multimodal |
|
|
| 10 |
Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections |
提出房间场景发现与分组方法以解决度假租赁图像无结构问题 |
contrastive learning large language model |
|
|
| 11 |
FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation |
提出FADRM以解决数据蒸馏中的信息消失问题 |
distillation |
✅ |
|
| 12 |
From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection |
提出基于眼动追踪的弱监督视频显著目标检测方法 |
contrastive learning spatiotemporal |
|
|
| 13 |
When Test-Time Adaptation Meets Self-Supervised Models |
提出自监督测试时适应协议以提升模型性能 |
contrastive learning distillation |
|
|