cs.CV（2024-09-26）

📊 共 33 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (12) 支柱二：RL算法与架构 (RL & Architecture) (9 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱四：生成式动作 (Generative Motion) (2 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (2)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Event-based Stereo Depth Estimation: A Survey	事件相机立体深度估计综述：全面回顾与未来展望	depth estimation stereo depth
2	Self-supervised Monocular Depth Estimation with Large Kernel Attention	提出基于大核注意力机制的自监督单目深度估计网络，提升深度细节。	depth estimation monocular depth
3	ViewpointDepth: A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts	提出ViewpointDepth数据集，用于评估视角变换下的单目深度估计模型鲁棒性	depth estimation monocular depth
4	Neural Implicit Representation for Highly Dynamic LiDAR Mapping and Odometry	提出基于神经隐式表示的动态LiDAR SLAM，提升动态环境下建图与定位精度。	NeRF neural radiance field implicit representation
5	TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene	提出TFS-NeRF，用于动态场景语义3D重建，无需模板且更高效。	NeRF scene reconstruction optical flow
6	Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes	针对抓取放置任务，提出基于CNN的场景理解方法，提升任务检测准确率。	scene understanding spatial relationship
7	Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions	提出Deblur e-NeRF，解决高速或低光条件下运动模糊事件的NeRF重建问题	NeRF neural radiance field
8	LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness	LLaVA-3D：一种简单有效的3D感知能力赋能LMMs的方法	scene understanding multimodal
9	Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval	提出SearchDet，通过Web图像检索实现免训练的长尾目标检测	open-vocabulary open vocabulary
10	Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation	Omni6D：用于类别级6D物体姿态估计的大词汇3D物体数据集	6D pose estimation
11	AI-Powered Augmented Reality for Satellite Assembly, Integration and Test	提出AI驱动的增强现实系统，用于提升卫星组装、集成与测试效率。	6D pose estimation
12	Neural Light Spheres for Implicit Image Stitching and View Synthesis	提出神经光球模型，用于隐式全景图像拼接和视角合成	scene reconstruction

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification	提出M3CoL，通过多模态Mixup对比学习捕获共享关系，提升多模态分类性能	contrastive learning multimodal	✅
14	SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion	SimVG：一种解耦多模态融合的简单视觉定位框架	distillation multimodal visual grounding	✅
15	CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning	提出TA-Cleaner，通过细粒度对抗语义增强提升对比学习中CLIP的后门防御能力	contrastive learning multimodal
16	Enhancing Logits Distillation with Plug\&Play Kendall's $τ$ Ranking Loss	提出一种基于Kendall's τ排序损失的即插即用logits蒸馏增强方法	teacher-student distillation	✅
17	EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation	提出EM-Net，利用Mamba高效学习通道和频率信息，用于3D医学图像分割	Mamba state space model
18	Good Data Is All Imitation Learning Needs	CF-Driver：利用对抗解释增强模仿学习，提升自动驾驶系统在罕见场景下的鲁棒性	imitation learning teacher-student
19	LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field	LightAvatar：基于动态神经光场的实时高效头部Avatar模型	distillation NeRF neural radiance field
20	P4Q: Learning to Prompt for Quantization in Visual-language Models	提出P4Q：一种面向视觉-语言模型量化的Prompt学习方法，提升低比特量化性能。	distillation multimodal
21	Self-Distilled Depth Refinement with Noisy Poisson Fusion	提出自蒸馏深度优化框架SDDR，解决深度优化中噪声干扰和边缘模糊问题	distillation depth estimation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing	综述并实证研究多模态大语言模型在交通目标检测中的应用	large language model multimodal
23	LLM4Brain: Training a Large Language Model for Brain Video Understanding	LLM4Brain：训练大语言模型用于大脑视频理解，实现fMRI信号到语义信息的重建	large language model multimodal
24	Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Uni-Med：通过Connector-MoE实现多任务学习的统一医学通用基础模型	large language model foundation model	✅
25	Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction	Lotus：基于扩散模型的高质量密集预测视觉基础模型	foundation model	✅
26	Find Rhinos without Finding Rhinos: Active Learning with Multimodal Imagery of South African Rhino Habitats	提出MultimodAL主动学习系统，利用多模态遥感影像高效识别犀牛粪堆，助力犀牛保护。	multimodal
27	CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches	CadVLM：首个用于参数化CAD草图生成的视觉语言模型，提升CAD设计效率。	large language model foundation model multimodal
28	EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions	EMOVA：赋能语言模型，实现具有生动情感的视觉、听觉和语音交互	large language model foundation model
29	Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks	评估基于ML的水印安全性：复制与移除攻击分析	foundation model	✅

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
30	EgoLM: Multi-Modal Language Model of Egocentric Motions	EgoLM：提出一种基于多模态大语言模型的自我中心运动理解框架	motion generation egocentric motion tracking
31	MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling	MoGenTS：基于时空联合建模的运动生成方法，有效提升运动生成质量。	motion generation VQ-VAE spatial relationship	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
32	EAGLE: Egocentric AGgregated Language-video Engine	EAGLE：用于第一视角视频理解的聚合语言-视频引擎与大规模数据集	egocentric large language model multimodal
33	Hand-object reconstruction via interaction-aware graph attention mechanism	提出交互感知图注意力机制，用于手-物体重建并提升物理合理性	hand-object reconstruction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页