cs.CV（2025-10-12）

📊 共 25 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (8 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (2) 支柱一：机器人控制 (Robot Control) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey	首个基于图像-语言预训练模型的图像到视频迁移学习的综述	foundation model multimodal
2	Post-TIPS Prediction via Multimodal Interaction: A Multi-Center Dataset and Framework for Survival, Complication, and Portal Pressure Assessment	提出MultiTIPS数据集和多模态交互框架，用于TIPS术后生存、并发症和门静脉压力评估。	foundation model multimodal
3	A Simple and Better Baseline for Visual Grounding	提出基于特征选择的视觉定位基线FSVG，提升精度与效率	visual grounding	✅
4	Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection	提出MoFE模块和动态Mixup策略，提升视觉基础模型在OOD检测中的性能	foundation model
5	GLOFNet -- A Multimodal Dataset for GLOF Monitoring and Prediction	GLOFNet：用于冰湖溃决洪水监测与预测的多模态数据集	multimodal
6	VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning	VR-Thinker：通过图像推理增强视频奖励模型，提升长视频偏好判断。	multimodal chain-of-thought
7	Towards Self-Refinement of Vision-Language Models with Triangular Consistency	提出基于三角一致性的自精炼框架，提升视觉-语言模型性能。	large language model	✅
8	When Images Speak Louder: Mitigating Language Bias-induced Hallucinations in VLMs through Cross-Modal Guidance	提出跨模态引导（CMG）方法，缓解视觉语言模型中的语言偏见导致的幻觉问题	multimodal
9	Towards Cybersickness Severity Classification from VR Gameplay Videos Using Transfer Learning and Temporal Modeling	提出基于迁移学习和时序建模的VR游戏视频晕动症严重程度分类方法	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
10	DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis	DEMO：解耦运动潜在流匹配，实现细粒度可控的说话人像合成	flow matching motion latent
11	EGD-YOLO: A Lightweight Multimodal Framework for Robust Drone-Bird Discrimination via Ghost-Enhanced YOLOv8n and EMA Attention under Adverse Condition	EGD-YOLO：轻量级多模态框架，通过Ghost增强YOLOv8n和EMA注意力实现恶劣条件下无人机-鸟类稳健区分	VIP multimodal
12	Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection	提出FS-VFM，通过自监督学习提升人脸安全任务的泛化能力	distillation foundation model	✅
13	MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition	提出MSF-Mamba，通过运动感知状态融合提升Mamba在微手势识别中的效率与精度。	Mamba SSM state space model
14	Unified Open-World Segmentation with Multi-Modal Prompts	COSINE：多模态提示下的统一开放世界分割模型	representation learning open-vocabulary open vocabulary
15	Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans	提出基于结构化谱图表示学习的3D CT多标签异常分析方法	representation learning spatial relationship
16	OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment	OmniQuality-R：通过全方位质量评估提升奖励模型性能	reinforcement learning chain-of-thought
17	Mesh-Gait: A Unified Framework for Gait Recognition Through Multi-Modal Representation Learning from 2D Silhouettes	Mesh-Gait：提出一种基于2D轮廓多模态表征学习的统一步态识别框架	representation learning

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos	提出动态高斯溅射框架，解决散焦和运动模糊视频的新视角合成问题	gaussian splatting splatting	✅
19	Taming a Retrieval Framework to Read Images in Humanlike Manner for Augmenting Generation of MLLMs	提出HuLiRAG框架，通过模拟人类视觉处理方式增强多模态大语言模型的生成能力	open-vocabulary open vocabulary large language model
20	Injecting Frame-Event Complementary Fusion into Diffusion for Optical Flow in Challenging Scenes	提出Diff-ABFlow，融合帧-事件互补信息，解决恶劣场景光流估计难题	optical flow feature matching
21	Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework	提出一种先验引导的3D高斯人体Avatar高效压缩框架，用于超低码率高质量的元宇宙应用。	3D gaussian splatting gaussian splatting splatting

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Guided Image Feature Matching using Feature Spatial Order	提出一种利用特征空间顺序引导的图像特征匹配方法，提升匹配效率和准确性。	feature matching
23	Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis	提出Combo-Gait，用于多模态步态识别和属性分析的统一Transformer框架	SMPL

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling	提出imHead：一种用于局部头部建模的大规模隐式可变形模型	manipulation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
25	UltraScatter: Ray-Based Simulation of Ultrasound Scattering	UltraScatter：提出基于射线追踪的超声散射快速模拟方法	PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页