cs.CV(2024-10-06)

📊 共 9 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
1 VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models 提出VISTA数据集,用于解释多模态模型中的视觉与文本关联 multimodal
2 MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration 提出MC-CoT框架,提升LLM和MLLM在零样本医学VQA任务中的性能 large language model multimodal chain-of-thought
3 MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans? MVP-Bench:评估大型视觉语言模型多层次视觉感知能力 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
4 Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering Mode-GS:单目深度引导的锚定3D高斯溅射,用于稳健的地面视角场景渲染 monocular depth 3D gaussian splatting gaussian splatting
5 StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting StreetSurfGS:提出基于平面的高斯溅射方法,用于可扩展的城市街道表面重建 gaussian splatting splatting

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
6 In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding 提出基于感知先验的In-Place全景辐射场分割方法,用于3D场景理解 distillation neural radiance field scene understanding
7 CAPEEN: Image Captioning with Early Exits and Knowledge Distillation 提出CAPEEN,利用早退机制和知识蒸馏加速图像描述生成并提升鲁棒性 distillation

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
8 Deformable NeRF using Recursively Subdivided Tetrahedra 提出DeformRF,利用递归细分四面体实现可变形NeRF,提升操控性和渲染质量。 manipulation NeRF neural radiance field

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
9 UniMuMo: Unified Text, Music and Motion Generation UniMuMo:统一文本、音乐和动作生成的多模态模型 motion generation multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页