| 1 |
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning |
VARGPT-v1.1:通过迭代指令调优和强化学习提升视觉自回归大统一模型 |
reinforcement learning DPO direct preference optimization |
✅ |
|
| 2 |
ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation |
ConsDreamer通过解耦视角偏差和几何一致性,提升零样本文本到3D生成的多视角一致性。 |
dreamer distillation 3D gaussian splatting |
|
|
| 3 |
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective |
提出空间相关性蒸馏框架,提升CLIP在密集预测任务中的空间感知能力 |
distillation open-vocabulary open vocabulary |
|
|
| 4 |
Agglomerating Large Vision Encoders via Distillation for VFSS Segmentation |
提出基于知识蒸馏的视觉编码器聚合方法,用于提升医学图像分割性能。 |
distillation foundation model |
|
|
| 5 |
Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments |
Morpheus:通过真实物理实验评估视频生成模型中的物理推理能力 |
world model physically plausible foundation model |
|
|
| 6 |
All-day Depth Completion via Thermal-LiDAR Fusion |
提出基于对比学习和伪监督的COPS框架,实现全天候热成像-LiDAR深度补全。 |
contrastive learning monocular depth foundation model |
|
|
| 7 |
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation |
提出基于选择性状态空间模型的视频湍流抑制方法,提升长距离成像质量。 |
Mamba state space model |
|
|
| 8 |
SelfMedHPM: Self Pre-training With Hard Patches Mining Masked Autoencoders For Medical Image Segmentation |
SelfMedHPM:基于难样本挖掘掩码自编码器的医学图像分割自监督预训练 |
masked autoencoder MAE |
|
|
| 9 |
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval |
提出AVIGATE模型,利用门控注意力机制和自适应对比损失提升音视频文本检索性能。 |
representation learning multimodal |
|
|