cs.CV(2024-12-10)
📊 共 5 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving | 提出RoboTron-Drive:用于自动驾驶的通用大型多模态模型 | large language model multimodal zero-shot transfer | ✅ | |
| 2 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | LoRA3D:通过低秩自校准提升3D几何基础模型的性能 | foundation model | ||
| 3 | Maya: An Instruction Finetuned Multilingual Multimodal Model | Maya:一种指令微调的多语言多模态模型,提升低资源语言和文化理解能力。 | multimodal | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models | 提出SAT数据集,用于动态空间推理的多模态语言模型训练。 | egocentric spatial relationship multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | GASP: Gaussian Avatars with Synthetic Priors | GASP:利用合成先验的高斯头像,实现单目视频驱动的360度高质量渲染 | gaussian splatting splatting | ✅ |