cs.CV(2024-12-10)

📊 共 5 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
1 RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving 提出RoboTron-Drive:用于自动驾驶的通用大型多模态模型 large language model multimodal zero-shot transfer
2 LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models LoRA3D:通过低秩自校准提升3D几何基础模型的性能 foundation model
3 Maya: An Instruction Finetuned Multilingual Multimodal Model Maya:一种指令微调的多语言多模态模型,提升低资源语言和文化理解能力。 multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
4 SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models 提出SAT数据集,用于动态空间推理的多模态语言模型训练。 egocentric spatial relationship multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
5 GASP: Gaussian Avatars with Synthetic Priors GASP:利用合成先验的高斯头像,实现单目视频驱动的360度高质量渲染 gaussian splatting splatting

⬅️ 返回 cs.CV 首页 · 🏠 返回主页