cs.CV(2024-07-18)

📊 共 40 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (14 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (12 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (11 🔗5) 支柱七:动作重定向 (Motion Retargeting) (2) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (14 篇)

#题目一句话要点标签🔗
1 EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting EaDeblur-GS:利用事件相机数据和高斯溅射实现运动模糊场景下的3D重建 3D gaussian splatting 3DGS gaussian splatting
2 Which objects help me to act effectively? Reasoning about physically-grounded affordances 提出基于LLM和VLM对话的具身可供性推理方法,提升机器人与环境交互的有效性。 open-vocabulary open vocabulary affordance
3 Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models 提出Diff2Scene,利用文本-图像扩散模型实现开放词汇3D语义分割。 open-vocabulary open vocabulary visual grounding
4 Attenuation-Aware Weighted Optical Flow with Medium Transmission Map for Learning-based Visual Odometry in Underwater terrain 提出基于衰减感知加权光流的wflow-TartanVO,提升水下视觉里程计精度 visual odometry optical flow TartanVO
5 SegPoint: Segment Any Point Cloud via Large Language Model SegPoint:利用大语言模型分割任意点云,实现多任务统一框架 open-vocabulary open vocabulary large language model
6 GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields GeometrySticker:实现对NeRF模型颜色重着色的所有权声明 NeRF neural radiance field
7 KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter KFD-NeRF:提出基于卡尔曼滤波的动态NeRF,实现高效高质量的运动重建。 NeRF neural radiance field
8 Lightweight Uncertainty Quantification with Simplex Semantic Segmentation for Terrain Traversability 提出一种轻量级不确定性量化模块,用于提升地形 traversability 的语义分割性能。 traversability
9 Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction 提出知识引导的视觉-语言模型,结合高效纠错,提升机器人对可供性的感知能力。 affordance
10 Many Perception Tasks are Highly Redundant Functions of their Input Data 揭示感知任务对输入数据的高度冗余性,为高效感知算法设计提供新思路 depth estimation optical flow
11 Training-Free Model Merging for Multi-target Domain Adaptation 提出一种免训练的多目标域自适应模型融合方法,解决数据访问限制问题。 scene understanding
12 Shape of Motion: 4D Reconstruction from a Single Video 提出基于运动形状的单视频4D重建方法,显式建模场景运动轨迹。 monocular depth
13 General Geometry-aware Weakly Supervised 3D Object Detection 提出通用几何感知弱监督3D目标检测方法以解决标注困难问题 scene understanding
14 Long-Term 3D Point Tracking By Cost Volume Fusion 提出基于代价体融合的深度学习框架,用于解决长期3D点云追踪问题 scene flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
15 Similarity over Factuality: Are we making progress on multimodal out-of-context misinformation detection? 提出基于多模态相似性的MUSE模型,用于检测多模态语境外信息,性能媲美甚至超越SOTA方法。 large language model foundation model multimodal
16 Learning Visual Grounding from Generative Vision and Language Model 利用生成式视觉语言模型,大规模生成视觉定位数据,提升定位性能。 visual grounding zero-shot transfer
17 ViLLa: Video Reasoning Segmentation with Large Language Model ViLLa:利用大语言模型实现视频推理分割,解决复杂场景下的定位与跟踪难题 large language model multimodal
18 EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing 提出EarthMarker:一种基于视觉提示的多模态大语言模型,用于遥感图像理解 large language model instruction following
19 Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition Qalam:一种用于阿拉伯语OCR和手写识别的多模态LLM foundation model multimodal
20 OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation OE-BevSeg:面向自动驾驶,提出对象感知和环境感知的BEV语义分割框架 multimodal
21 PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving PG-Attack:面向自动驾驶视觉基础模型的精确制导对抗攻击框架 foundation model
22 Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks 评估并提升LLM在感知任务中的可信度,以行人检测为例 large language model multimodal
23 Restore Anything Model via Efficient Degradation Adaptation 提出RAM:通过高效退化自适应实现通用图像修复,显著降低模型复杂度和计算量。 large language model multimodal
24 Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data 提出基于对抗增强数据的视频-文本检索评估方法,并利用LLM提升模型性能。 large language model foundation model
25 VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance VLG-CBM:提出视觉-语言引导的概念瓶颈模型,提升可解释性和性能。 large language model
26 BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models 提出BEAF数据集与评测指标,用于评估视觉语言模型在场景变化下的幻觉问题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
27 Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation 提出几何引导自蒸馏方法,提升开放词汇3D场景理解性能 representation learning distillation scene understanding
28 Multimodal Label Relevance Ranking via Reinforcement Learning 提出LR²PPO,通过强化学习解决多模态标签相关性排序问题 reinforcement learning PPO multimodal
29 Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation 提出引导一致性采样与亮度均衡生成方法,提升文本到3D生成质量 distillation 3D gaussian splatting 3DGS
30 GroupMamba: Efficient Group-Based Visual State Space Model 提出GroupMamba,一种高效的基于分组的视觉状态空间模型,提升图像识别性能。 Mamba SSM state space model
31 X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs 提出X-Former,融合对比学习与重构学习,提升MLLM的视觉表征能力。 representation learning MAE contrastive learning
32 QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View QuIIL团队提出针对第一人称视角下救生干预流程自动化的解决方案 distillation first-person view
33 Continual Distillation Learning: Knowledge Distillation in Prompt-based Continual Learning 提出基于Prompt的知识蒸馏方法(KDP),提升Prompt式持续学习中小ViT模型的性能。 distillation
34 On the Discriminability of Self-Supervised Representation Learning 提出动态语义调整器(DSA)以解决自监督学习中的特征拥挤问题,提升判别能力。 representation learning
35 Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation 提出低置信度伪标签蒸馏方法,提升无源域自适应目标检测中小目标和难例的检测性能。 distillation
36 Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation 提出标签辅助蒸馏(LAD)方法,提升语义分割任务中轻量级教师模型的知识蒸馏效果。 distillation
37 DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection 提出DFMSD:双重特征掩码分阶段知识蒸馏用于目标检测,提升异构网络蒸馏效果。 distillation

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
38 Configural processing as an optimized strategy for robust object recognition in neural networks 利用构型线索提升神经网络在目标识别中对几何变换的鲁棒性 spatial relationship
39 OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction 提出OAT:用于注视点扫描路径预测的物体级别注意力Transformer spatial relationship

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
40 SCAPE: A Simple and Strong Category-Agnostic Pose Estimator SCAPE:一种简单而强大的类别无关姿态估计器,提升精度与效率。 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页