cs.CV(2025-10-12)

📊 共 25 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (8 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey 首个基于图像-语言预训练模型的图像到视频迁移学习的综述 foundation model multimodal
2 Post-TIPS Prediction via Multimodal Interaction: A Multi-Center Dataset and Framework for Survival, Complication, and Portal Pressure Assessment 提出MultiTIPS数据集和多模态交互框架,用于TIPS术后生存、并发症和门静脉压力评估。 foundation model multimodal
3 A Simple and Better Baseline for Visual Grounding 提出基于特征选择的视觉定位基线FSVG,提升精度与效率 visual grounding
4 Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection 提出MoFE模块和动态Mixup策略,提升视觉基础模型在OOD检测中的性能 foundation model
5 GLOFNet -- A Multimodal Dataset for GLOF Monitoring and Prediction GLOFNet:用于冰湖溃决洪水监测与预测的多模态数据集 multimodal
6 VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning VR-Thinker:通过图像推理增强视频奖励模型,提升长视频偏好判断。 multimodal chain-of-thought
7 Towards Self-Refinement of Vision-Language Models with Triangular Consistency 提出基于三角一致性的自精炼框架,提升视觉-语言模型性能。 large language model
8 When Images Speak Louder: Mitigating Language Bias-induced Hallucinations in VLMs through Cross-Modal Guidance 提出跨模态引导(CMG)方法,缓解视觉语言模型中的语言偏见导致的幻觉问题 multimodal
9 Towards Cybersickness Severity Classification from VR Gameplay Videos Using Transfer Learning and Temporal Modeling 提出基于迁移学习和时序建模的VR游戏视频晕动症严重程度分类方法 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
10 DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis DEMO:解耦运动潜在流匹配,实现细粒度可控的说话人像合成 flow matching motion latent
11 EGD-YOLO: A Lightweight Multimodal Framework for Robust Drone-Bird Discrimination via Ghost-Enhanced YOLOv8n and EMA Attention under Adverse Condition EGD-YOLO:轻量级多模态框架,通过Ghost增强YOLOv8n和EMA注意力实现恶劣条件下无人机-鸟类稳健区分 VIP multimodal
12 Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection 提出FS-VFM,通过自监督学习提升人脸安全任务的泛化能力 distillation foundation model
13 MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition 提出MSF-Mamba,通过运动感知状态融合提升Mamba在微手势识别中的效率与精度。 Mamba SSM state space model
14 Unified Open-World Segmentation with Multi-Modal Prompts COSINE:多模态提示下的统一开放世界分割模型 representation learning open-vocabulary open vocabulary
15 Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans 提出基于结构化谱图表示学习的3D CT多标签异常分析方法 representation learning spatial relationship
16 OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment OmniQuality-R:通过全方位质量评估提升奖励模型性能 reinforcement learning chain-of-thought
17 Mesh-Gait: A Unified Framework for Gait Recognition Through Multi-Modal Representation Learning from 2D Silhouettes Mesh-Gait:提出一种基于2D轮廓多模态表征学习的统一步态识别框架 representation learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
18 Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos 提出动态高斯溅射框架,解决散焦和运动模糊视频的新视角合成问题 gaussian splatting splatting
19 Taming a Retrieval Framework to Read Images in Humanlike Manner for Augmenting Generation of MLLMs 提出HuLiRAG框架,通过模拟人类视觉处理方式增强多模态大语言模型的生成能力 open-vocabulary open vocabulary large language model
20 Injecting Frame-Event Complementary Fusion into Diffusion for Optical Flow in Challenging Scenes 提出Diff-ABFlow,融合帧-事件互补信息,解决恶劣场景光流估计难题 optical flow feature matching
21 Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework 提出一种先验引导的3D高斯人体Avatar高效压缩框架,用于超低码率高质量的元宇宙应用。 3D gaussian splatting gaussian splatting splatting

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
22 Guided Image Feature Matching using Feature Spatial Order 提出一种利用特征空间顺序引导的图像特征匹配方法,提升匹配效率和准确性。 feature matching
23 Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis 提出Combo-Gait,用于多模态步态识别和属性分析的统一Transformer框架 SMPL

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
24 ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling 提出imHead:一种用于局部头部建模的大规模隐式可变形模型 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
25 UltraScatter: Ray-Based Simulation of Ultrasound Scattering UltraScatter:提出基于射线追踪的超声散射快速模拟方法 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页