| 1 |
MM-IFEngine: Towards Multimodal Instruction Following |
提出MM-IFEngine,用于生成高质量多模态指令跟随数据,并构建评测基准。 |
DPO direct preference optimization large language model |
✅ |
|
| 2 |
ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting |
ContrastiveGaussian:利用对比学习和高斯溅射实现高保真3D生成 |
contrastive learning distillation gaussian splatting |
|
|
| 3 |
Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction |
提出基于关键短语提取的检索增强型多模态LLM放射报告生成方法 |
contrastive learning large language model multimodal |
|
|
| 4 |
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation |
GLUS:统一全局-局部推理的MLLM用于视频分割,实现RefVOS新SOTA |
contrastive learning large language model |
✅ |
|
| 5 |
Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs |
电商图像嵌入基准测试:评估预训练模型、微调策略与实际权衡 |
contrastive learning foundation model |
|
|
| 6 |
Perception-R1: Pioneering Perception Policy with Reinforcement Learning |
Perception-R1:利用强化学习提升多模态大语言模型感知策略,显著提高视觉感知任务性能。 |
reinforcement learning policy learning reward design |
|
|
| 7 |
Kimi-VL Technical Report |
Kimi-VL:高效开源MoE视觉语言模型,擅长长文本理解和高分辨率视觉输入 |
reinforcement learning multimodal chain-of-thought |
✅ |
|
| 8 |
Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases |
提出基于模态分解和掩码自编码器的心力衰竭预测方法,适用于稀疏超声心动图数据库。 |
masked autoencoder MAE |
✅ |
|
| 9 |
BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation |
BoxDreamer:通过预测物体边界框角点实现通用物体姿态估计 |
dreamer |
|
|
| 10 |
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement |
ThinkLite-VL:利用MCTS指导样本选择,实现数据高效的视觉推理自提升 |
distillation multimodal |
|
|
| 11 |
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model |
VLM-R1:基于规则奖励的稳定且泛化性强的视觉语言大模型 |
reinforcement learning large language model |
✅ |
|
| 12 |
DGFamba: Learning Flow Factorized State Space for Visual Domain Generalization |
提出DG-Famba,通过流分解状态空间学习领域泛化视觉表征 |
Mamba state space model |
|
|