cs.CV(2024-10-02)

📊 共 16 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection 3DGS-DET:利用边界引导和Box聚焦采样增强3D高斯溅射用于3D目标检测 3D gaussian splatting 3DGS gaussian splatting
2 Multi-viewregulated gaussian splatting for novel view synthesis 提出多视角约束高斯溅射方法,提升新视角合成质量和几何精度。 3D gaussian splatting 3DGS gaussian splatting
3 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Depth Pro:亚秒级生成高精度单目度量深度图 depth estimation monocular depth metric depth
4 Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking 提出Open3DTrack,解决开放词汇3D多目标跟踪问题,提升自动驾驶环境感知能力。 open-vocabulary open vocabulary
5 SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images 提出SegEarth-OV,实现遥感图像的免训练开放词汇分割 open-vocabulary open vocabulary
6 EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis 提出EVER:一种用于实时视角合成的精确体积椭球渲染方法 3D gaussian splatting 3DGS gaussian splatting
7 Neural Eulerian Scene Flow Fields EulerFlow:基于神经先验的连续时空ODE场景流估计 scene flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
8 UlcerGPT: A Multimodal Approach Leveraging Large Language and Vision Models for Diabetic Foot Ulcer Image Transcription UlcerGPT:利用大型语言和视觉模型进行糖尿病足溃疡图像转录的多模态方法 large language model multimodal
9 Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition 提出基于内容-风格分解的半监督微调方法,提升视觉基础模型在低标注数据下的性能。 foundation model
10 Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval 量化多模态多语言检索中翻译文本与原生感知的差距,并提出数据增强策略。 multimodal
11 EMMA: Efficient Visual Alignment in Multi-Modal LLMs 提出EMMA:一种高效的多模态LLM视觉对齐方法,提升任务适应性。 large language model foundation model
12 Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks Leopard:面向富文本多图任务的视觉语言模型 large language model multimodal
13 Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description Emo3D:提出用于3D面部表情生成的度量与基准数据集,并提出新的评估指标。 large language model
14 Robust Modality-incomplete Anomaly Detection: A Modality-instructive Framework with Benchmark 提出RADAR框架,解决模态缺失下的鲁棒工业异常检测问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
15 LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion LaGeM:提出一种用于3D表示学习和扩散的大型几何模型,解决大规模数据集和生成建模的挑战。 representation learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
16 Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker 提出基于多视角视觉语言模型的儿童屏幕时间识别框架,提升自然场景下的监测精度。 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页