cs.CV(2023-12-01)
📊 共 24 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (11 🔗5)
支柱二:RL算法与架构 (RL & Architecture) (6 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Open-vocabulary object 6D pose estimation | 提出开放词汇对象6D姿态估计以解决传统方法的局限性 | open-vocabulary open vocabulary 6D pose estimation | ✅ | |
| 2 | FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting | 提出FSGS框架以实现实时少样本视图合成 | monocular depth 3D gaussian splatting gaussian splatting | ✅ | |
| 3 | NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance | 提出NeuSG以解决神经隐式表面重建中细节不足问题 | 3D gaussian splatting gaussian splatting splatting | ||
| 4 | FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models | 提出FreeZe以解决无训练的零-shot 6D姿态估计问题 | 6D pose estimation foundation model | ✅ | |
| 5 | Gaussian Grouping: Segment and Edit Anything in 3D Scenes | 提出Gaussian Grouping以解决3D场景细粒度理解问题 | gaussian splatting splatting NeRF | ✅ | |
| 6 | Enhancing Diffusion Models with 3D Perspective Geometry Constraints | 提出几何约束以增强扩散模型的透视准确性 | depth estimation monocular depth zero-shot transfer | ||
| 7 | Grounding Everything: Emerging Localization Properties in Vision-Language Transformers | 提出GEM模块以实现零-shot开放词汇物体定位 | open-vocabulary open vocabulary foundation model | ||
| 8 | DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines | 提出DISTWAR以加速光栅化渲染中的原子操作 | 3D gaussian splatting gaussian splatting splatting | ||
| 9 | Segment Any 3D Gaussians | 提出SAGA方法以实现高效的3D高斯分割 | 3D gaussian splatting gaussian splatting splatting | ||
| 10 | Dense Optical Tracking: Connecting the Dots | 提出DOT方法以解决视频点跟踪速度慢的问题 | optical flow | ✅ | |
| 11 | MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video | 提出MorpheuS以解决动态场景360°表面重建问题 | scene reconstruction |
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts | 提出ViP-LLaVA以解决区域特定视觉理解问题 | VIP multimodal | ||
| 13 | Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment | 提出通用机器人3D视觉语言模型以解决稀缺标签下的场景理解问题 | contrastive learning distillation scene understanding | ||
| 14 | Adversarial Score Distillation: When score distillation meets GAN | 提出对抗性分数蒸馏方法以解决现有方法的敏感性问题 | distillation classifier-free guidance | ✅ | |
| 15 | Improve Supervised Representation Learning with Masked Image Modeling | 提出一种简单有效的掩码图像建模以提升监督表示学习 | representation learning | ||
| 16 | EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything | 提出EfficientSAM以解决SAM模型计算成本高的问题 | representation learning zero-shot transfer | ||
| 17 | Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement | 提出静态-动态分离框架以实现视频蒸馏 | distillation | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Dolphins: Multimodal Language Model for Driving | 提出Dolphins模型以解决复杂驾驶场景下的多模态理解问题 | multimodal chain-of-thought | ||
| 19 | Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts | 提出Omni-SMoLA以提升多模态模型的通用性与性能 | multimodal | ||
| 20 | Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning | 提出DPLNet以解决多模态语义分割训练效率低的问题 | multimodal | ||
| 21 | Zero-Shot Video Question Answering with Procedural Programs | 提出ProViQ以解决视频零-shot问答问题 | large language model multimodal | ✅ | |
| 22 | WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models | 提出WeGeFT以实现大型模型的高效适应 | instruction following | ✅ | |
| 23 | RadioGalaxyNET: Dataset and Novel Computer Vision Algorithms for the Detection of Extended Radio Galaxies and Infrared Hosts | 提出RadioGalaxyNET以自动检测扩展射电星系及其红外宿主 | multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models | 提出TrackDiffusion以解决视频生成中的动态控制问题 | manipulation world model |