cs.CV(2025-09-25)

📊 共 16 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance 提出推理增强的领域自适应多模态大语言模型预训练方法,用于短视频内容治理 large language model multimodal chain-of-thought
2 Instruction-tuned Self-Questioning Framework for Multimodal Reasoning 提出基于指令调优的自问框架SQ-InstructBLIP,用于增强多模态推理能力 large language model multimodal
3 X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning 提出X-CoT,利用LLM链式思考推理实现可解释的文本到视频检索 chain-of-thought
4 VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding VideoJudge:通过自举法实现MLLM作为视频理解评判器的可扩展监督 large language model multimodal chain-of-thought
5 A Sentinel-3 foundation model for ocean colour 提出基于Sentinel-3的海洋颜色基础模型,提升海洋观测任务性能 foundation model
6 Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations Decipher-MR:用于3D MRI表征的视觉-语言基础模型 foundation model
7 CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models 提出CompareBench,用于评估视觉语言模型中的视觉比较推理能力 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
8 MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources MMR1:通过方差感知采样和开放资源增强多模态推理能力 reinforcement learning multimodal chain-of-thought
9 SlideMamba: Entropy-Based Adaptive Fusion of GNN and Mamba for Enhanced Representation Learning in Digital Pathology SlideMamba:结合GNN与Mamba的熵自适应融合框架,提升数字病理学表征学习 predictive model Mamba representation learning
10 FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction FantasyWorld:通过统一视频和3D预测实现几何一致的世界建模 world model foundation model
11 X-Streamer: Unified Human World Modeling with Audiovisual Interaction X-Streamer:提出基于视听交互的统一人类世界建模框架,实现数字人实时交互。 world model multimodal
12 Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms 提出基于Semivariogram的对比学习方法,提升图像地理定位精度 contrastive learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
13 Dense Semantic Matching with VGGT Prior 提出基于VGGT先验的稠密语义匹配方法,提升几何感知和匹配可靠性 VGGT foundation model
14 Quantized Visual Geometry Grounded Transformer 提出QuantVGGT,解决VGGT量化难题,实现资源受限场景下的高效3D重建。 VGGT

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
15 MORPH: PDE Foundation Models with Arbitrary Data Modality 提出MORPH模型以处理多模态偏微分方程数据 spatiotemporal foundation model multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
16 Does FLUX Already Know How to Perform Physically Plausible Image Composition? 提出SHINE框架,无需训练即可实现物理上合理的图像合成 physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页