cs.CV(2024-04-04)

📊 共 36 篇论文 | 🔗 18 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (15 🔗8) 支柱九:具身大模型 (Embodied Foundation Models) (11 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗2) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (15 篇)

#题目一句话要点标签🔗
1 Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting 提出基于每个高斯嵌入的变形方法以解决动态场景重建问题 3D gaussian splatting 3DGS gaussian splatting
2 OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting 提出OmniGS以解决传统3D高斯点云重建的局限性 3D gaussian splatting gaussian splatting splatting
3 Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation 提出自适应离散视差体积以解决单目深度估计问题 depth estimation monocular depth
4 WorDepth: Variational Language Prior for Monocular Depth Estimation 提出WorDepth以解决单目深度估计中的模糊性问题 depth estimation monocular depth
5 GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis 提出GaSpCT以解决CT扫描视图合成问题 gaussian splatting splatting
6 Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View 提出Gen3DSR以解决单视图3D重建的复杂场景问题 3D reconstruction scene reconstruction
7 VF-NeRF: Viewshed Fields for Rigid NeRF Registration 提出VF-NeRF以解决NeRF的刚性配准问题 NeRF neural radiance field
8 Is CLIP the main roadblock for fine-grained open-world perception? 提出改进CLIP以解决细粒度开放世界感知问题 open-vocabulary open vocabulary multimodal
9 RaFE: Generative Radiance Fields Restoration 提出RaFE以解决NeRF在低质量输入下的恢复问题 3D reconstruction NeRF neural radiance field
10 OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views 提出OpenNeRF以解决开放集3D场景分割问题 NeRF open-vocabulary open vocabulary
11 Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning 提出KYN以解决单视图重建中的空间视觉语言推理问题 depth estimation scene reconstruction
12 Learning Transferable Negative Prompts for Out-of-Distribution Detection 提出NegPrompt以解决OOD检测中的假阳性问题 open-vocabulary open vocabulary
13 LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity 提出LeGrad以解决视觉变换器的可解释性问题 open-vocabulary open vocabulary
14 The More You See in 2D, the More You Perceive in 3D 提出SAP3D以解决无姿态图像的3D重建问题 3D reconstruction
15 MonoCD: Monocular 3D Object Detection with Complementary Depths 提出MonoCD以解决单目3D目标检测中的深度估计问题 depth estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
16 MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens 提出MiniGPT4-Video以解决视频理解中的多模态挑战 large language model multimodal
17 PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model 提出基于推理的3D部件分割方法以解决现有系统的局限性 multimodal
18 No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance 提出多模态模型数据需求分析以解决零样本泛化问题 multimodal
19 TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices 提出TinyVQA以解决资源受限设备上的视觉问答问题 multimodal
20 Scaling Up Video Summarization Pretraining with Large Language Models 提出基于大语言模型的视频摘要生成方法以解决数据集不足问题 large language model
21 LongVLM: Efficient Long Video Understanding via Large Language Models 提出LongVLM以解决长视频理解中的局部信息缺失问题 large language model
22 OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioning 提出OW-VISCapTor以解决开放世界视频实例分割与描述问题 large language model foundation model
23 SemGrasp: Semantic Grasp Generation via Language Aligned Discretization 提出SemGrasp以解决语义信息不足导致的抓取生成问题 large language model multimodal
24 Test Time Training for Industrial Anomaly Segmentation 提出测试时训练策略以解决工业异常分割问题 multimodal
25 HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion 提出HAPNet以解决RGB-热成像场景解析中的特征融合问题 foundation model
26 iSeg: Interactive 3D Segmentation via Interactive Attention 提出iSeg以解决3D形状交互式分割问题 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
27 InsectMamba: Insect Pest Classification with State Space Model 提出InsectMamba以解决昆虫害虫分类问题 Mamba SSM state space model
28 SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer 提出SC4D框架以解决视频到4D生成中的运动与外观解耦问题 distillation NeRF motion prediction
29 SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation 提出SDPose以解决小型变换器模型性能不足问题 distillation
30 Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning 提出稀疏概念瓶颈模型以提升可解释分类性能 contrastive learning
31 FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification 提出FACTUAL框架以解决SAR图像分类的对抗性鲁棒性问题 contrastive learning

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
32 Towards more realistic human motion prediction with attention to motion coordination 提出协调吸引子以解决人类运动预测中的协调性问题 human motion human motion prediction motion prediction
33 Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture 提出SeNeVA以解决自主车辆运动预测中的不确定性问题 motion prediction

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
34 You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects 提出动态场景重建管道以解决6自由度机器人抓取问题 manipulation scene reconstruction scene understanding
35 BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes 提出BioVL-QR数据集以解决生化视频理解难题 manipulation egocentric

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
36 AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales 提出AGL-NET以解决多模态全局定位中的尺度差异问题 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页