FreeRet: MLLMs as Training-Free Retrievers

作者: Yuhan Zhu, Xiangyu Zeng, Chenting Wang, Xinhao Li, Yicheng Xu, Ziang Yan, Yi Wang, Limin Wang

分类: cs.CV

发布日期: 2025-09-29

💡 一句话要点

提出FreeRet框架以实现无训练的多模态检索

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 多模态检索 大型语言模型 无训练框架 语义嵌入 重排序 信息检索 智能问答

📋 核心要点

现有的多模态检索方法通常需要大量的后期训练，限制了其灵活性和效率。
FreeRet框架通过直接从MLLM中提取语义嵌入，避免了复杂的训练过程，提升了检索效率。
在MMEB和MMEB-V2基准测试中，FreeRet显著超越了经过数百万对训练的模型，展示了其强大的检索能力。

📝 摘要（中文）

多模态大型语言模型（MLLMs）作为混合模态检索的基础，通常需要大量后期训练以转化为对比编码器。本文提出FreeRet，一个即插即用的框架，能够将任何MLLM直接转化为两阶段检索器。FreeRet首先从模型中直接导出语义嵌入以进行快速候选搜索，然后利用其推理能力进行精确的重排序。该框架在多个基准测试中表现优异，且具备模型无关性，支持多种模态组合，统一了检索、重排序和生成的流程。研究表明，经过精心利用的预训练MLLM可以在无需训练的情况下，作为强大的检索引擎，填补其作为通用模型的关键空白。

🔬 方法详解

问题定义：本文旨在解决现有多模态检索方法依赖大量后期训练的问题，这限制了其应用的灵活性和效率。

核心思路：FreeRet框架的核心思想是利用预训练的MLLM直接生成语义嵌入，避免了复杂的训练过程，同时利用模型的推理能力进行精确的重排序。

技术框架：FreeRet的整体架构分为两个主要阶段：第一阶段是快速候选搜索，通过MLLM生成语义嵌入；第二阶段是重排序，利用模型的推理能力对候选结果进行精细调整。

关键创新：FreeRet的主要创新在于绕过词汇对齐层，直接获取语义嵌入，并通过显式先验条件化表示生成，减轻了重排序中的框架效应。这与现有方法的本质区别在于无需后期训练。

关键设计：在设计中，FreeRet采用了特定的参数设置和损失函数，以确保生成的嵌入具有语义一致性，同时优化了重排序过程中的选择框架，以提高结果的准确性。

🖼️ 关键图片

📊 实验亮点

在MMEB和MMEB-V2基准测试中，FreeRet显著超越了经过数百万对训练的模型，展示了在检索效率和准确性上的显著提升。具体而言，FreeRet在多个数据集上均表现出色，证明了其作为无训练检索引擎的有效性。

🎯 应用场景

FreeRet框架具有广泛的应用潜力，适用于需要高效检索的多模态任务，如信息检索、推荐系统和智能问答等领域。其无训练的特性使得在实际应用中能够快速部署，降低了开发成本。未来，FreeRet可能会推动多模态检索技术的进一步发展，促进不同模态之间的融合与应用。

📄 摘要（原文）

Multimodal large language models (MLLMs) are emerging as versatile foundations for mixed-modality retrieval. Yet, they often require heavy post-hoc training to convert them into contrastive encoders for retrieval. This work asks: Can off-the-shelf MLLMs serve as powerful retrievers without additional training? We present FreeRet, a plug-and-play framework that turns any MLLM into a two-stage retriever. FreeRet first derives semantically grounded embeddings directly from the model for fast candidate search, and then exploits its reasoning ability for precise reranking. The framework contributes three advances: bypassing lexical alignment layers to obtain semantically faithful embeddings, conditioning representation generation with explicit priors, and mitigating framing effect in reranking via neutral choice framing. On the MMEB and MMEB-V2 benchmarks spanning 46 datasets, FreeRet substantially outperforms models trained on millions of pairs. Beyond benchmarks, FreeRet is model-agnostic and scales seamlessly across MLLM families and sizes, preserves their generative abilities, supports arbitrary modality combinations, and unifies retrieval, reranking, and generation into end-to-end RAG within a single model. Our findings demonstrate that pretrained MLLMs, when carefully harnessed, can serve as strong retrieval engines without training, closing a critical gap in their role as generalists.

FreeRet: MLLMs as Training-Free Retrievers

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理