cs.CV(2024-04-03)
📊 共 19 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (6)
支柱一:机器人控制 (Robot Control) (4 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (2 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | ALOHa: A New Measure for Hallucination in Captioning Models | 提出ALOHa以解决视觉描述模型中的幻觉问题 | open-vocabulary open vocabulary large language model | ✅ | |
| 2 | TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving | 提出TCLC-GS以解决LiDAR与相机数据融合不足的问题 | 3D gaussian splatting 3D reconstruction gaussian splatting | ||
| 3 | Neural Radiance Fields with Torch Units | 提出Torch-NeRF以解决复杂场景重建问题 | 3D reconstruction NeRF neural radiance field | ||
| 4 | Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion | 提出一种新方法以解决室内3D重建中的遮挡表面补全问题 | 3D reconstruction scene reconstruction | ||
| 5 | Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition | 提出频率分解方法以实现高保真且可转移的NeRF编辑 | NeRF | ✅ | |
| 6 | LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis | 提出LiDAR4D以解决动态LiDAR视图合成问题 | NeRF neural radiance field | ✅ | |
| 7 | APC2Mesh: Bridging the gap from occluded building façades to full 3D models | 提出APC2Mesh以解决建筑外立面遮挡问题 | 3D reconstruction |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | SalFoM: Dynamic Saliency Prediction with Video Foundation Models | 提出SalFoM以解决视频显著性预测中的动态建模问题 | foundation model | ||
| 9 | VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments | 提出VIAssist以帮助视觉障碍者利用多模态大语言模型 | large language model | ||
| 10 | Cohort-Individual Cooperative Learning for Multimodal Cancer Survival Analysis | 提出CCL框架以解决多模态癌症生存分析中的信息异质性问题 | multimodal | ||
| 11 | LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models | 提出LVLM-Interpret以增强大规模视觉语言模型的可解释性 | large language model | ||
| 12 | DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement | 提出DIBS框架以提升密集视频字幕生成质量 | large language model | ||
| 13 | Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models | 提出DiffExplainer以实现跨模态全局解释 | multimodal |
🔬 支柱一:机器人控制 (Robot Control) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Text-driven Affordance Learning from Egocentric Vision | 提出文本驱动的可供性学习方法以解决机器人交互问题 | manipulation affordance egocentric | ||
| 15 | AWOL: Analysis WithOut synthesis using Language | 提出语言驱动的3D形状生成方法以解决建模难题 | quadruped | ||
| 16 | Deep Image Composition Meets Image Forgery | 提出自动化数据生成方法以解决图像伪造检测数据不足问题 | manipulation | ✅ | |
| 17 | 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization | 提出3DStyleGLIP以解决3D对象细节风格化问题 | manipulation | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation | 提出RS3Mamba以解决遥感图像语义分割中的长程建模问题 | Mamba state space model | ✅ | |
| 19 | DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets | 提出DeiT-LT以解决长尾数据集上ViT训练问题 | distillation |