cs.CV(2023-12-26)

📊 共 13 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (5 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (5) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
1 LangSplat: 3D Language Gaussian Splatting LangSplat:提出基于3D高斯splatting的3D语言场,实现高效精确的开放词汇查询。 gaussian splatting splatting NeRF
2 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI EmbodiedScan:面向具身智能的整体多模态3D感知数据集与基准 scene understanding embodied AI
3 Pano-NeRF: Synthesizing High Dynamic Range Novel Views with Geometry from Sparse Low Dynamic Range Panoramic Images Pano-NeRF:利用稀疏低动态范围全景图像和几何信息合成高动态范围新视角 NeRF neural radiance field
4 2D-Guided 3D Gaussian Segmentation 提出基于2D分割引导的3D高斯分割方法,实现快速多目标分割 NeRF neural radiance field
5 Learning Deformable Hypothesis Sampling for Accurate PatchMatch Multi-View Stereo 提出可变形假设采样器,提升PatchMatch多视角立体重建精度 depth estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
6 ChartBench: A Benchmark for Complex Visual Reasoning in Charts 提出ChartBench基准,用于评估多模态大语言模型在图表中的复杂视觉推理能力。 large language model multimodal chain-of-thought
7 Towards Robust Multimodal Prompting With Missing Modalities 提出正交多模态提示方法,解决缺失模态场景下的鲁棒性问题。 multimodal
8 VirtualPainting: Addressing Sparsity with Virtual Points and Distance-Aware Data Augmentation for 3D Object Detection VirtualPainting:利用虚拟点和距离感知数据增强解决3D目标检测中的稀疏性问题 multimodal
9 Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control 提出链式生成方法,利用语音驱动的多模态先验提升3D手势合成质量。 multimodal
10 Semantic-aware SAM for Point-Prompted Instance Segmentation 提出SAPNet,利用语义感知的SAM进行点提示的实例分割 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
11 Cloud-Device Collaborative Learning for Multimodal Large Language Models 提出云端设备协同持续自适应框架,提升压缩多模态大模型在设备端的泛化能力 distillation scene understanding large language model
12 DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision 提出DL3DV-10K大规模场景数据集,促进深度学习3D视觉研究与通用NeRF学习。 representation learning NeRF neural radiance field

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
13 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception 提出DOPNet,通过正交平面解耦和多视角几何一致性感知实现精准360°全景布局估计。 geometric consistency

⬅️ 返回 cs.CV 首页 · 🏠 返回主页