cs.CV(2025-07-19)

📊 共 19 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF DiSCO-3D:提出一种基于NeRF的开放词汇子概念发现与分割方法 NeRF scene understanding open-vocabulary
2 Adaptive 3D Gaussian Splatting Video Streaming: Visual Saliency-Aware Tiling and Meta-Learning-Based Bitrate Adaptation 提出基于显著性自适应瓦片和元学习码率适配的3D高斯溅射视频流方案 3D gaussian splatting 3DGS gaussian splatting
3 Adaptive 3D Gaussian Splatting Video Streaming 提出基于高斯变形场的自适应3D高斯溅射视频流方案,优化传输质量。 3D gaussian splatting 3DGS gaussian splatting
4 Descrip3D: Enhancing Large Language Model-based 3D Scene Understanding with Object-Level Text Descriptions Descrip3D:利用对象级文本描述增强大语言模型对3D场景的理解 scene understanding large language model
5 Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey 综述前馈式3D重建与视图合成技术以解决传统方法的局限性 3D gaussian splatting 3DGS gaussian splatting
6 CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding CRAFT:用于视觉功能可供性接地的神经符号框架 scene understanding affordance
7 DCHM: Depth-Consistent Human Modeling for Multiview Detection 提出DCHM框架,用于多视角行人检测中深度一致的人体建模。 depth estimation gaussian splatting splatting
8 Motion Segmentation and Egomotion Estimation from Event-Based Normal Flow 提出基于事件Normal Flow的运动分割与自运动估计框架,适用于神经形态视觉传感器。 depth estimation optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
9 From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Grounded Open-vocabulary Situation Recognition 提出多模态互动提示蒸馏方法以提升开放词汇情境识别能力 distillation open-vocabulary open vocabulary
10 MultiRetNet: A Multimodal Vision Model and Deferral System for Staging Diabetic Retinopathy MultiRetNet:结合多模态信息与临床决策的糖尿病视网膜病变分期系统 contrastive learning multimodal
11 BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM BusterX++:提出基于MLLM的统一跨模态AI生成内容检测与解释框架 reinforcement learning large language model multimodal
12 Multispectral State-Space Feature Fusion: Bridging Shared and Cross-Parametric Interactions for Object Detection 提出基于状态空间模型的多光谱特征融合框架MS2Fusion,提升目标检测性能。 SSM state space model
13 Towards a Proactive Autoscaling Framework for Data Stream Processing at the Edge using GRU and Transfer Learning 提出基于GRU和迁移学习的主动边缘数据流处理自动伸缩框架 reinforcement learning predictive model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
14 Docopilot: Improving Multimodal Models for Document-Level Understanding 提出Docopilot,一种用于文档级理解的多模态模型,并构建高质量数据集Doc-750K。 large language model multimodal
15 Multimodal AI for Gastrointestinal Diagnostics: Tackling VQA in MEDVQA-GI 2025 利用多模态AI的Florence模型解决胃肠道内窥镜图像的VQA问题 foundation model multimodal
16 Text2VR: Automated instruction Generation in Virtual Reality using Large language Models for Assembly Task 提出Text2VR,利用大语言模型自动生成VR装配任务的教学指令 large language model
17 ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding 提出ArtiMuse以解决图像美学评估的量化与理解问题 large language model multimodal
18 Efficient Whole Slide Pathology VQA via Token Compression 提出TCP-LLaVA,通过token压缩实现高效的全切片病理图像VQA large language model multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
19 GPI-Net: Gestalt-Guided Parallel Interaction Network via Orthogonal Geometric Consistency for Robust Point Cloud Registration 提出基于格式塔引导的并行交互网络GPI-Net,通过正交几何一致性实现鲁棒的点云配准。 spatial relationship geometric consistency

⬅️ 返回 cs.CV 首页 · 🏠 返回主页