cs.CV(2025-07-30)

📊 共 30 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (14 🔗6) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗3) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (14 篇)

#题目一句话要点标签🔗
1 Reference-Guided Diffusion Inpainting For Multimodal Counterfactual Generation 提出MObI和AnydoorMed,实现参考图像引导的多模态扩散模型图像修复与生成。 foundation model multimodal
2 A Large Language Model Powered Integrated Circuit Footprint Geometry Understanding 提出LLM4-IC8K框架,利用大语言模型解决集成电路封装几何尺寸理解难题。 large language model multimodal
3 Zero-Shot Image Anomaly Detection Using Generative Foundation Models 利用生成式预训练模型实现零样本图像异常检测 foundation model
4 Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards 提出U3-Attack,一种通用的、输入无关的多模态对抗攻击,用于绕过文本到图像模型的安全防护。 multimodal
5 Gems: Group Emotion Profiling Through Multimodal Situational Understanding GEMS:通过多模态情境理解进行群体情绪分析 multimodal
6 DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception DeltaVLM:通过指令引导的差异感知实现交互式遥感图像变化分析 large language model multimodal instruction following
7 What is Beneath Misogyny: Misogynous Memes Classification and Explanation 提出MM-Misogyny模型,用于检测、分类和解释网络仇恨女性的梗图 large language model multimodal
8 Goal-Based Vision-Language Driving NovaDrive:基于视觉语言模型的单分支自动驾驶架构,提升安全性与效率 embodied AI
9 Vocabulary-free Fine-grained Visual Recognition via Enriched Contextually Grounded Vision-Language Model 提出E-FineR,一种基于上下文增强视觉-语言模型的免词汇细粒度图像识别方法。 large language model
10 Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation 提出OmniAVS数据集和OISA模型,用于解决多模态融合的指代音视频分割任务。 multimodal
11 MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention 提出MoCHA以解决视觉语言模型的训练与推理成本问题 large language model
12 Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings 提出FetalCLIP$_{CLS}$,利用胎儿超声图像基础模型提升低资源环境下的图像质量评估。 foundation model
13 Segment Anything for Video: A Comprehensive Review of Video Object Segmentation and Tracking from Past to Future 综述基于SAM的视频目标分割与跟踪方法,展望未来发展趋势 foundation model
14 A Linear N-Point Solver for Structure and Motion from Asynchronous Tracks 提出一种线性N点解算器,用于从异步轨迹中进行结构和运动估计 TAMP

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
15 Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction 提出REUrbanGS框架,实现鲁棒高效的城市级场景3D高斯重建与实时渲染。 3D gaussian splatting gaussian splatting splatting
16 UFV-Splatter: Pose-Free Feed-Forward 3D Gaussian Splatting Adapted to Unfavorable Views UFV-Splatter:用于不利视角的三维高斯溅射快速前馈方法 3D gaussian splatting 3DGS gaussian splatting
17 Details Matter for Indoor Open-vocabulary 3D Instance Segmentation 针对室内开放词汇3D实例分割,提出细节增强方案,显著提升性能。 open-vocabulary open vocabulary
18 Adaptive Time-step Training for Enhancing Spike-Based Neural Radiance Fields 提出PATA:一种自适应时间步长的脉冲NeRF训练方法,提升资源受限场景下的渲染效率。 NeRF neural radiance field
19 DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion DepR:深度引导的单视图场景重建,融合实例级扩散模型 scene reconstruction
20 A Dual-Feature Extractor Framework for Accurate Back Depth and Spine Morphology Estimation from Monocular RGB Images 提出双特征提取框架GAMA-Net,用于单目RGB图像脊柱形态精准评估 depth estimation
21 Estimating 2D Camera Motion with Hybrid Motion Basis CamFlow:利用混合运动基估计2D相机运动,提升复杂场景鲁棒性 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
22 VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning VL-Cogito:通过渐进课程强化学习提升多模态推理能力 reinforcement learning large language model multimodal
23 ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents ScreenCoder:通过模块化多模态Agent提升视觉到代码的生成,用于前端自动化 reinforcement learning large language model multimodal
24 LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks 提出LIDAR:轻量级自适应线索感知视觉Mamba网络,用于结构裂缝的多模态分割。 Mamba multimodal
25 Bridging the Gap in Missing Modalities: Leveraging Knowledge Distillation and Style Matching for Brain Tumor Segmentation MST-KDNet:利用知识蒸馏和风格匹配解决缺失模态下的脑肿瘤分割难题 distillation feature matching
26 GVD: Guiding Video Diffusion Model for Scalable Video Distillation 提出GVD:一种引导视频扩散模型,用于可扩展的视频数据集蒸馏。 distillation
27 MINR: Implicit Neural Representations with Masked Image Modelling 提出MINR框架,结合隐式神经表示与掩码图像建模,提升图像重建的鲁棒性和泛化性。 masked autoencoder MAE

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
28 MRpro - open PyTorch-based MR reconstruction and processing package MRpro:基于PyTorch的开源磁共振重建与处理软件包,促进科研协作与可复现性。 manipulation
29 Bi-Level Optimization for Self-Supervised AI-Generated Face Detection 提出基于双层优化的自监督AI生成人脸检测方法,提升对未知生成器的泛化性。 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
30 Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques 模态感知特征匹配综述:全面回顾单模态与跨模态技术 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页