cs.CV(2023-12-26)
📊 共 13 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (5)
支柱二:RL算法与架构 (RL & Architecture) (2 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | LangSplat: 3D Language Gaussian Splatting | LangSplat:提出基于3D高斯splatting的3D语言场,实现高效精确的开放词汇查询。 | gaussian splatting splatting NeRF | ||
| 2 | EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI | EmbodiedScan:面向具身智能的整体多模态3D感知数据集与基准 | scene understanding embodied AI | ✅ | |
| 3 | Pano-NeRF: Synthesizing High Dynamic Range Novel Views with Geometry from Sparse Low Dynamic Range Panoramic Images | Pano-NeRF:利用稀疏低动态范围全景图像和几何信息合成高动态范围新视角 | NeRF neural radiance field | ✅ | |
| 4 | 2D-Guided 3D Gaussian Segmentation | 提出基于2D分割引导的3D高斯分割方法,实现快速多目标分割 | NeRF neural radiance field | ||
| 5 | Learning Deformable Hypothesis Sampling for Accurate PatchMatch Multi-View Stereo | 提出可变形假设采样器,提升PatchMatch多视角立体重建精度 | depth estimation | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | ChartBench: A Benchmark for Complex Visual Reasoning in Charts | 提出ChartBench基准,用于评估多模态大语言模型在图表中的复杂视觉推理能力。 | large language model multimodal chain-of-thought | ||
| 7 | Towards Robust Multimodal Prompting With Missing Modalities | 提出正交多模态提示方法,解决缺失模态场景下的鲁棒性问题。 | multimodal | ||
| 8 | VirtualPainting: Addressing Sparsity with Virtual Points and Distance-Aware Data Augmentation for 3D Object Detection | VirtualPainting:利用虚拟点和距离感知数据增强解决3D目标检测中的稀疏性问题 | multimodal | ||
| 9 | Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control | 提出链式生成方法,利用语音驱动的多模态先验提升3D手势合成质量。 | multimodal | ||
| 10 | Semantic-aware SAM for Point-Prompted Instance Segmentation | 提出SAPNet,利用语义感知的SAM进行点提示的实例分割 | foundation model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Cloud-Device Collaborative Learning for Multimodal Large Language Models | 提出云端设备协同持续自适应框架,提升压缩多模态大模型在设备端的泛化能力 | distillation scene understanding large language model | ||
| 12 | DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision | 提出DL3DV-10K大规模场景数据集,促进深度学习3D视觉研究与通用NeRF学习。 | representation learning NeRF neural radiance field | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception | 提出DOPNet,通过正交平面解耦和多视角几何一致性感知实现精准360°全景布局估计。 | geometric consistency | ✅ |