cs.CV（2024-07-18）

📊 共 40 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (14 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (12 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (11 🔗5) 支柱七：动作重定向 (Motion Retargeting) (2) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting	EaDeblur-GS：利用事件相机数据和高斯溅射实现运动模糊场景下的3D重建	3D gaussian splatting 3DGS gaussian splatting
2	Which objects help me to act effectively? Reasoning about physically-grounded affordances	提出基于LLM和VLM对话的具身可供性推理方法，提升机器人与环境交互的有效性。	open-vocabulary open vocabulary affordance
3	Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models	提出Diff2Scene，利用文本-图像扩散模型实现开放词汇3D语义分割。	open-vocabulary open vocabulary visual grounding
4	Attenuation-Aware Weighted Optical Flow with Medium Transmission Map for Learning-based Visual Odometry in Underwater terrain	提出基于衰减感知加权光流的wflow-TartanVO，提升水下视觉里程计精度	visual odometry optical flow TartanVO	✅
5	SegPoint: Segment Any Point Cloud via Large Language Model	SegPoint：利用大语言模型分割任意点云，实现多任务统一框架	open-vocabulary open vocabulary large language model
6	GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields	GeometrySticker：实现对NeRF模型颜色重着色的所有权声明	NeRF neural radiance field	✅
7	KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter	KFD-NeRF：提出基于卡尔曼滤波的动态NeRF，实现高效高质量的运动重建。	NeRF neural radiance field
8	Lightweight Uncertainty Quantification with Simplex Semantic Segmentation for Terrain Traversability	提出一种轻量级不确定性量化模块，用于提升地形 traversability 的语义分割性能。	traversability
9	Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction	提出知识引导的视觉-语言模型，结合高效纠错，提升机器人对可供性的感知能力。	affordance
10	Many Perception Tasks are Highly Redundant Functions of their Input Data	揭示感知任务对输入数据的高度冗余性，为高效感知算法设计提供新思路	depth estimation optical flow
11	Training-Free Model Merging for Multi-target Domain Adaptation	提出一种免训练的多目标域自适应模型融合方法，解决数据访问限制问题。	scene understanding	✅
12	Shape of Motion: 4D Reconstruction from a Single Video	提出基于运动形状的单视频4D重建方法，显式建模场景运动轨迹。	monocular depth
13	General Geometry-aware Weakly Supervised 3D Object Detection	提出通用几何感知弱监督3D目标检测方法以解决标注困难问题	scene understanding	✅
14	Long-Term 3D Point Tracking By Cost Volume Fusion	提出基于代价体融合的深度学习框架，用于解决长期3D点云追踪问题	scene flow

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Similarity over Factuality: Are we making progress on multimodal out-of-context misinformation detection?	提出基于多模态相似性的MUSE模型，用于检测多模态语境外信息，性能媲美甚至超越SOTA方法。	large language model foundation model multimodal	✅
16	Learning Visual Grounding from Generative Vision and Language Model	利用生成式视觉语言模型，大规模生成视觉定位数据，提升定位性能。	visual grounding zero-shot transfer
17	ViLLa: Video Reasoning Segmentation with Large Language Model	ViLLa：利用大语言模型实现视频推理分割，解决复杂场景下的定位与跟踪难题	large language model multimodal	✅
18	EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing	提出EarthMarker：一种基于视觉提示的多模态大语言模型，用于遥感图像理解	large language model instruction following	✅
19	Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition	Qalam：一种用于阿拉伯语OCR和手写识别的多模态LLM	foundation model multimodal
20	OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation	OE-BevSeg：面向自动驾驶，提出对象感知和环境感知的BEV语义分割框架	multimodal
21	PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving	PG-Attack：面向自动驾驶视觉基础模型的精确制导对抗攻击框架	foundation model	✅
22	Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks	评估并提升LLM在感知任务中的可信度，以行人检测为例	large language model multimodal
23	Restore Anything Model via Efficient Degradation Adaptation	提出RAM：通过高效退化自适应实现通用图像修复，显著降低模型复杂度和计算量。	large language model multimodal
24	Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data	提出基于对抗增强数据的视频-文本检索评估方法，并利用LLM提升模型性能。	large language model foundation model
25	VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance	VLG-CBM：提出视觉-语言引导的概念瓶颈模型，提升可解释性和性能。	large language model
26	BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models	提出BEAF数据集与评测指标，用于评估视觉语言模型在场景变化下的幻觉问题	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
27	Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation	提出几何引导自蒸馏方法，提升开放词汇3D场景理解性能	representation learning distillation scene understanding
28	Multimodal Label Relevance Ranking via Reinforcement Learning	提出LR²PPO，通过强化学习解决多模态标签相关性排序问题	reinforcement learning PPO multimodal	✅
29	Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation	提出引导一致性采样与亮度均衡生成方法，提升文本到3D生成质量	distillation 3D gaussian splatting 3DGS	✅
30	GroupMamba: Efficient Group-Based Visual State Space Model	提出GroupMamba，一种高效的基于分组的视觉状态空间模型，提升图像识别性能。	Mamba SSM state space model	✅
31	X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs	提出X-Former，融合对比学习与重构学习，提升MLLM的视觉表征能力。	representation learning MAE contrastive learning
32	QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View	QuIIL团队提出针对第一人称视角下救生干预流程自动化的解决方案	distillation first-person view
33	Continual Distillation Learning: Knowledge Distillation in Prompt-based Continual Learning	提出基于Prompt的知识蒸馏方法(KDP)，提升Prompt式持续学习中小ViT模型的性能。	distillation
34	On the Discriminability of Self-Supervised Representation Learning	提出动态语义调整器(DSA)以解决自监督学习中的特征拥挤问题，提升判别能力。	representation learning
35	Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation	提出低置信度伪标签蒸馏方法，提升无源域自适应目标检测中小目标和难例的检测性能。	distillation	✅
36	Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation	提出标签辅助蒸馏（LAD）方法，提升语义分割任务中轻量级教师模型的知识蒸馏效果。	distillation	✅
37	DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection	提出DFMSD：双重特征掩码分阶段知识蒸馏用于目标检测，提升异构网络蒸馏效果。	distillation

🔬 支柱七：动作重定向 (Motion Retargeting) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
38	Configural processing as an optimized strategy for robust object recognition in neural networks	利用构型线索提升神经网络在目标识别中对几何变换的鲁棒性	spatial relationship
39	OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction	提出OAT：用于注视点扫描路径预测的物体级别注意力Transformer	spatial relationship

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
40	SCAPE: A Simple and Strong Category-Agnostic Pose Estimator	SCAPE：一种简单而强大的类别无关姿态估计器，提升精度与效率。	feature matching	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页