| 1 |
FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model |
提出FOCA:一种面向频率域的跨域伪造检测、定位与解释的多模态大语言模型。 |
large language model multimodal |
|
|
| 2 |
SCHEMA for Gemini 3 Pro Image: A Structured Methodology for Controlled AI Image Generation on Google's Native Multimodal Model |
SCHEMA:为Gemini 3 Pro Image设计的可控AI图像生成结构化方法 |
multimodal |
|
|
| 3 |
A high-resolution nationwide urban village mapping product for 342 Chinese cities based on foundation models |
提出GeoLink-UV,基于基础模型构建中国342个城市高分辨率城中村地图产品。 |
foundation model |
|
|
| 4 |
Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement |
提出EditedID框架,解决多模态编辑中人脸ID一致性难题 |
multimodal |
✅ |
|
| 5 |
Benchmarking Computational Pathology Foundation Models For Semantic Segmentation |
提出计算病理学分割基准,评估并集成多个Foundation Model以提升组织病理图像语义分割性能。 |
foundation model |
|
|
| 6 |
MIRROR: Multimodal Iterative Reasoning via Reflection on Visual Regions |
提出MIRROR框架,通过视觉区域反思进行多模态迭代推理,提升视觉语言模型的正确性和减少幻觉。 |
multimodal |
|
|
| 7 |
Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code |
提出GeoCode数据集,通过代码预测实现视觉对齐,提升多模态几何推理能力。 |
multimodal |
✅ |
|
| 8 |
Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs |
提出基于对抗引导的双重注入方法,用于多模态大语言模型的版权保护 |
large language model multimodal |
|
|
| 9 |
HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing |
提出HIME:通过幻觉不敏感模型编辑缓解LVLM中的对象幻觉问题 |
large language model multimodal |
|
|
| 10 |
LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency |
LaS-Comp:利用潜在空间一致性的零样本3D补全方法 |
foundation model |
✅ |
|
| 11 |
Frame2Freq: Spectral Adapters for Fine-Grained Video Understanding |
提出Frame2Freq,通过频谱适配器提升视频细粒度理解能力 |
foundation model |
✅ |
|
| 12 |
TIACam: Text-Anchored Invariant Feature Learning with Auto-Augmentation for Camera-Robust Zero-Watermarking |
TIACam提出了一种文本锚定的不变特征学习框架,用于提升相机拍摄鲁棒性的零水印技术。 |
multimodal |
|
|
| 13 |
Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding |
提出Video-TwG,通过课程强化推理和视频定位提升长视频理解能力 |
multimodal |
|
|