cs.CV(2024-04-03)

📊 共 19 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱一:机器人控制 (Robot Control) (4 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 ALOHa: A New Measure for Hallucination in Captioning Models 提出ALOHa以解决视觉描述模型中的幻觉问题 open-vocabulary open vocabulary large language model
2 TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving 提出TCLC-GS以解决LiDAR与相机数据融合不足的问题 3D gaussian splatting 3D reconstruction gaussian splatting
3 Neural Radiance Fields with Torch Units 提出Torch-NeRF以解决复杂场景重建问题 3D reconstruction NeRF neural radiance field
4 Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion 提出一种新方法以解决室内3D重建中的遮挡表面补全问题 3D reconstruction scene reconstruction
5 Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition 提出频率分解方法以实现高保真且可转移的NeRF编辑 NeRF
6 LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis 提出LiDAR4D以解决动态LiDAR视图合成问题 NeRF neural radiance field
7 APC2Mesh: Bridging the gap from occluded building façades to full 3D models 提出APC2Mesh以解决建筑外立面遮挡问题 3D reconstruction

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
8 SalFoM: Dynamic Saliency Prediction with Video Foundation Models 提出SalFoM以解决视频显著性预测中的动态建模问题 foundation model
9 VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments 提出VIAssist以帮助视觉障碍者利用多模态大语言模型 large language model
10 Cohort-Individual Cooperative Learning for Multimodal Cancer Survival Analysis 提出CCL框架以解决多模态癌症生存分析中的信息异质性问题 multimodal
11 LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models 提出LVLM-Interpret以增强大规模视觉语言模型的可解释性 large language model
12 DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement 提出DIBS框架以提升密集视频字幕生成质量 large language model
13 Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models 提出DiffExplainer以实现跨模态全局解释 multimodal

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
14 Text-driven Affordance Learning from Egocentric Vision 提出文本驱动的可供性学习方法以解决机器人交互问题 manipulation affordance egocentric
15 AWOL: Analysis WithOut synthesis using Language 提出语言驱动的3D形状生成方法以解决建模难题 quadruped
16 Deep Image Composition Meets Image Forgery 提出自动化数据生成方法以解决图像伪造检测数据不足问题 manipulation
17 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization 提出3DStyleGLIP以解决3D对象细节风格化问题 manipulation

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
18 RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation 提出RS3Mamba以解决遥感图像语义分割中的长程建模问题 Mamba state space model
19 DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets 提出DeiT-LT以解决长尾数据集上ViT训练问题 distillation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页