| 1 |
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone |
提出基于扩散语言模型的Dream-VL和Dream-VLA,用于视觉语言理解和机器人控制。 |
vision-language-action VLA large language model |
|
|
| 2 |
Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains |
提出SR-MCR框架,通过自奖励机制提升多模态LLM在视觉领域的推理连贯性和准确性。 |
multimodal visual grounding |
|
|
| 3 |
Multimodal Diffeomorphic Registration with Neural ODEs and Structural Descriptors |
提出基于神经ODE和结构描述符的多模态微分同胚配准方法 |
multimodal |
|
|
| 4 |
SCAFusion: A Multimodal 3D Detection Framework for Small Object Detection in Lunar Surface Exploration |
SCAFusion:用于月球表面小目标检测的多模态3D检测框架 |
multimodal |
|
|
| 5 |
CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation |
CritiFusion:通过语义批判和频谱对齐实现高质量文本到图像生成 |
large language model multimodal |
|
|
| 6 |
Rethinking Memory Design in SAM-Based Visual Object Tracking |
提出SAM跟踪统一混合记忆框架,提升长时遮挡和复杂场景下的鲁棒性 |
foundation model |
✅ |
|
| 7 |
DreamOmni3: Scribble-based Editing and Generation |
DreamOmni3:提出基于草图的图像编辑与生成框架,解决文本提示不足问题。 |
multimodal |
|
|
| 8 |
Unified Review and Benchmark of Deep Segmentation Architectures for Cardiac Ultrasound on CAMUS |
针对心脏超声图像分割,统一评估和基准测试深度学习架构 |
multimodal |
|
|