cs.CV(2025-06-29)

📊 共 27 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱四:生成式动作 (Generative Motion) (2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 Endo-4DGX: Robust Endoscopic Scene Reconstruction and Illumination Correction with Gaussian Splatting 提出Endo-4DGX以解决内窥镜场景中的光照不均问题 3D gaussian splatting 3DGS gaussian splatting
2 SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting 提出SurgTPGS以解决3D外科场景理解问题 gaussian splatting splatting scene reconstruction
3 STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene 提出STD-GS框架以解决高动态场景重建中的时空特征不匹配问题 gaussian splatting splatting scene reconstruction
4 Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation 提出R²S框架以解决复杂空间推理问题 scene understanding large language model multimodal
5 TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints 提出TVG-SLAM以解决RGB-only SLAM系统的鲁棒性问题 3D gaussian splatting 3DGS gaussian splatting
6 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering 提出IR3D-Bench以解决视觉语言模型场景理解不足问题 scene understanding VLA
7 MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation 提出MEMFOF以解决高分辨率光流估计中的内存效率问题 optical flow
8 Dynamic View Synthesis from Small Camera Motion Videos 提出基于分布的深度正则化以解决小相机运动下的动态视图合成问题 NeRF

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
9 Token Activation Map to Visually Explain Multimodal LLMs 提出Token Activation Map以解决多模态LLM可解释性问题 large language model multimodal
10 MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation 提出MedRegion-CT以解决CT报告生成中的区域特征捕捉问题 large language model multimodal
11 Multimodal image registration for effective thermographic fever screening 提出多模态图像配准方法以提高热成像发热筛查的准确性 multimodal
12 OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions 提出OmniVCus以解决多主体视频定制问题 multimodal
13 UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding 提出UrbanLLaVA以解决城市智能中的多模态数据处理问题 large language model
14 GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields 提出GeoProg3D以解决城市规模3D语言场景交互问题 large language model
15 DEL: Dense Event Localization for Multi-modal Audio-Visual Understanding 提出DEL框架以解决多模态视频中的密集事件定位问题 multimodal
16 Dare to Plagiarize? Plagiarized Painting Recognition and Retrieval 提出艺术作品抄袭检测方法以保护版权 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
17 MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition 提出MoMa框架以解决视频理解中的时空建模问题 Mamba state space model foundation model
18 FA-Seg: A Fast and Accurate Diffusion-Based Method for Open-Vocabulary Segmentation 提出FA-Seg以解决开放词汇语义分割中的精度与效率问题 contrastive learning open-vocabulary open vocabulary
19 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings 提出MoCa以解决多模态嵌入模型的关键限制问题 contrastive learning multimodal
20 RoboScape: Physics-informed Embodied World Model 提出RoboScape以解决现有机器人视频生成的物理意识不足问题 world model geometric consistency
21 Self-Supervised Contrastive Learning for Multi-Label Images 提出自监督对比学习方法以解决多标签图像表示学习问题 representation learning contrastive learning
22 Competitive Distillation: A Simple Learning Strategy for Improving Visual Classification 提出竞争蒸馏策略以提升视觉分类性能 distillation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
23 VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions 提出VolumetricSMPL以解决高效人机交互问题 motion synthesis human-object interaction egocentric
24 BridgeShape: Latent Diffusion Schrödinger Bridge for 3D Shape Completion 提出BridgeShape以解决3D形状补全中的全局传输路径建模问题 VQ-VAE

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
25 Why Settle for Mid: A Probabilistic Viewpoint to Spatial Relationship Alignment in Text-to-image Models 提出概率框架以解决文本到图像模型的空间关系对齐问题 spatial relationship

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
26 Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis 提出Causal-VidSyn以解决交通事故视频合成中的因果关系问题 egocentric

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
27 Trident: Detecting Face Forgeries with Adversarial Triplet Learning 提出Trident框架以解决面部伪造检测问题 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页