cs.CV(2024-09-26)

📊 共 33 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (12) 支柱二:RL算法与架构 (RL & Architecture) (9 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (12 篇)

#题目一句话要点标签🔗
1 Event-based Stereo Depth Estimation: A Survey 事件相机立体深度估计综述:全面回顾与未来展望 depth estimation stereo depth
2 Self-supervised Monocular Depth Estimation with Large Kernel Attention 提出基于大核注意力机制的自监督单目深度估计网络,提升深度细节。 depth estimation monocular depth
3 ViewpointDepth: A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts 提出ViewpointDepth数据集,用于评估视角变换下的单目深度估计模型鲁棒性 depth estimation monocular depth
4 Neural Implicit Representation for Highly Dynamic LiDAR Mapping and Odometry 提出基于神经隐式表示的动态LiDAR SLAM,提升动态环境下建图与定位精度。 NeRF neural radiance field implicit representation
5 TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene 提出TFS-NeRF,用于动态场景语义3D重建,无需模板且更高效。 NeRF scene reconstruction optical flow
6 Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes 针对抓取放置任务,提出基于CNN的场景理解方法,提升任务检测准确率。 scene understanding spatial relationship
7 Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions 提出Deblur e-NeRF,解决高速或低光条件下运动模糊事件的NeRF重建问题 NeRF neural radiance field
8 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness LLaVA-3D:一种简单有效的3D感知能力赋能LMMs的方法 scene understanding multimodal
9 Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval 提出SearchDet,通过Web图像检索实现免训练的长尾目标检测 open-vocabulary open vocabulary
10 Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation Omni6D:用于类别级6D物体姿态估计的大词汇3D物体数据集 6D pose estimation
11 AI-Powered Augmented Reality for Satellite Assembly, Integration and Test 提出AI驱动的增强现实系统,用于提升卫星组装、集成与测试效率。 6D pose estimation
12 Neural Light Spheres for Implicit Image Stitching and View Synthesis 提出神经光球模型,用于隐式全景图像拼接和视角合成 scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
13 Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification 提出M3CoL,通过多模态Mixup对比学习捕获共享关系,提升多模态分类性能 contrastive learning multimodal
14 SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion SimVG:一种解耦多模态融合的简单视觉定位框架 distillation multimodal visual grounding
15 CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning 提出TA-Cleaner,通过细粒度对抗语义增强提升对比学习中CLIP的后门防御能力 contrastive learning multimodal
16 Enhancing Logits Distillation with Plug\&Play Kendall's $τ$ Ranking Loss 提出一种基于Kendall's τ排序损失的即插即用logits蒸馏增强方法 teacher-student distillation
17 EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation 提出EM-Net,利用Mamba高效学习通道和频率信息,用于3D医学图像分割 Mamba state space model
18 Good Data Is All Imitation Learning Needs CF-Driver:利用对抗解释增强模仿学习,提升自动驾驶系统在罕见场景下的鲁棒性 imitation learning teacher-student
19 LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field LightAvatar:基于动态神经光场的实时高效头部Avatar模型 distillation NeRF neural radiance field
20 P4Q: Learning to Prompt for Quantization in Visual-language Models 提出P4Q:一种面向视觉-语言模型量化的Prompt学习方法,提升低比特量化性能。 distillation multimodal
21 Self-Distilled Depth Refinement with Noisy Poisson Fusion 提出自蒸馏深度优化框架SDDR,解决深度优化中噪声干扰和边缘模糊问题 distillation depth estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
22 Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing 综述并实证研究多模态大语言模型在交通目标检测中的应用 large language model multimodal
23 LLM4Brain: Training a Large Language Model for Brain Video Understanding LLM4Brain:训练大语言模型用于大脑视频理解,实现fMRI信号到语义信息的重建 large language model multimodal
24 Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE Uni-Med:通过Connector-MoE实现多任务学习的统一医学通用基础模型 large language model foundation model
25 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Lotus:基于扩散模型的高质量密集预测视觉基础模型 foundation model
26 Find Rhinos without Finding Rhinos: Active Learning with Multimodal Imagery of South African Rhino Habitats 提出MultimodAL主动学习系统,利用多模态遥感影像高效识别犀牛粪堆,助力犀牛保护。 multimodal
27 CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches CadVLM:首个用于参数化CAD草图生成的视觉语言模型,提升CAD设计效率。 large language model foundation model multimodal
28 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions EMOVA:赋能语言模型,实现具有生动情感的视觉、听觉和语音交互 large language model foundation model
29 Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks 评估基于ML的水印安全性:复制与移除攻击分析 foundation model

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
30 EgoLM: Multi-Modal Language Model of Egocentric Motions EgoLM:提出一种基于多模态大语言模型的自我中心运动理解框架 motion generation egocentric motion tracking
31 MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling MoGenTS:基于时空联合建模的运动生成方法,有效提升运动生成质量。 motion generation VQ-VAE spatial relationship

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
32 EAGLE: Egocentric AGgregated Language-video Engine EAGLE:用于第一视角视频理解的聚合语言-视频引擎与大规模数据集 egocentric large language model multimodal
33 Hand-object reconstruction via interaction-aware graph attention mechanism 提出交互感知图注意力机制,用于手-物体重建并提升物理合理性 hand-object reconstruction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页