cs.CV(2024-08-23)
📊 共 12 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (3)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting | 提出双向高斯基元(BiGS),实现动态光照下可重新光照的3D高斯溅射 | 3D gaussian splatting gaussian splatting splatting | ||
| 2 | SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting | 提出Lantent-SpecGS,通过隐空间特征建模3D高斯光 Splatting 的视角相关外观,提升渲染质量。 | 3D gaussian splatting gaussian splatting splatting | ||
| 3 | Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge | 提出一种融合实例与深度知识的无地图视觉重定位方法 | metric depth |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models | VFM-Det:基于大规模预训练模型实现高性能车辆检测 | contrastive learning large language model foundation model | ✅ | |
| 5 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | 提出MAEMI:用于半导体电镜图像分析的小型指令调优视觉-语言基础模型 | distillation multimodal instruction following | ||
| 6 | SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning | 提出语义对抗增强(SeA)方法,提升无监督表征学习中固定深度特征的下游任务性能。 | representation learning | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | MME-RealWorld:构建高分辨率真实世界多模态大模型评测基准 | large language model multimodal | ||
| 8 | VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models | VALE:一种用于图像分类器的多模态视觉和语言解释框架 | multimodal | ||
| 9 | Online Zero-Shot Classification with CLIP | 提出OnZeta在线零样本分类方法,利用目标数据分布提升CLIP性能。 | zero-shot transfer | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | ShapeICP: Iterative Category-level Object Pose and Shape Estimation from Depth | ShapeICP:基于深度图的迭代类别级物体姿态和形状估计 | manipulation | ||
| 11 | Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing | 提出任务导向的扩散反演方法以解决图像编辑精度问题 | manipulation |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | CustomCrafter:一种无需额外视频和微调即可定制视频生成,同时保持运动和概念组合能力的新框架。 | motion generation | ✅ |