cs.CV(2024-07-08)
📊 共 4 篇论文
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (2)
支柱二:RL算法与架构 (RL & Architecture) (1)
支柱三:空间感知与语义 (Perception & Semantics) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | VIMI: Grounding Video Generation through Multi-modal Instruction | VIMI:通过多模态指令实现视频生成中的视觉 grounding | multimodal visual grounding | ||
| 2 | SOLO: A Single Transformer for Scalable Vision-Language Modeling | 提出SOLO:一种用于可扩展视觉-语言建模的单Transformer架构。 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 3 | 4D Contrastive Superflows are Dense 3D Representation Learners | 提出SuperFlow框架,利用时空一致性进行LiDAR数据的自监督3D表征学习 | representation learning contrastive learning spatiotemporal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | GeoNLF:几何引导的无位姿神经激光雷达场,用于大规模点云重建与视角合成。 | NeRF neural radiance field geometric consistency |