cs.CV(2023-12-01)

📊 共 24 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (11 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)

#题目一句话要点标签🔗
1 Open-vocabulary object 6D pose estimation 提出开放词汇对象6D姿态估计以解决传统方法的局限性 open-vocabulary open vocabulary 6D pose estimation
2 FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting 提出FSGS框架以实现实时少样本视图合成 monocular depth 3D gaussian splatting gaussian splatting
3 NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance 提出NeuSG以解决神经隐式表面重建中细节不足问题 3D gaussian splatting gaussian splatting splatting
4 FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models 提出FreeZe以解决无训练的零-shot 6D姿态估计问题 6D pose estimation foundation model
5 Gaussian Grouping: Segment and Edit Anything in 3D Scenes 提出Gaussian Grouping以解决3D场景细粒度理解问题 gaussian splatting splatting NeRF
6 Enhancing Diffusion Models with 3D Perspective Geometry Constraints 提出几何约束以增强扩散模型的透视准确性 depth estimation monocular depth zero-shot transfer
7 Grounding Everything: Emerging Localization Properties in Vision-Language Transformers 提出GEM模块以实现零-shot开放词汇物体定位 open-vocabulary open vocabulary foundation model
8 DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines 提出DISTWAR以加速光栅化渲染中的原子操作 3D gaussian splatting gaussian splatting splatting
9 Segment Any 3D Gaussians 提出SAGA方法以实现高效的3D高斯分割 3D gaussian splatting gaussian splatting splatting
10 Dense Optical Tracking: Connecting the Dots 提出DOT方法以解决视频点跟踪速度慢的问题 optical flow
11 MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video 提出MorpheuS以解决动态场景360°表面重建问题 scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
12 ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts 提出ViP-LLaVA以解决区域特定视觉理解问题 VIP multimodal
13 Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment 提出通用机器人3D视觉语言模型以解决稀缺标签下的场景理解问题 contrastive learning distillation scene understanding
14 Adversarial Score Distillation: When score distillation meets GAN 提出对抗性分数蒸馏方法以解决现有方法的敏感性问题 distillation classifier-free guidance
15 Improve Supervised Representation Learning with Masked Image Modeling 提出一种简单有效的掩码图像建模以提升监督表示学习 representation learning
16 EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything 提出EfficientSAM以解决SAM模型计算成本高的问题 representation learning zero-shot transfer
17 Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement 提出静态-动态分离框架以实现视频蒸馏 distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
18 Dolphins: Multimodal Language Model for Driving 提出Dolphins模型以解决复杂驾驶场景下的多模态理解问题 multimodal chain-of-thought
19 Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts 提出Omni-SMoLA以提升多模态模型的通用性与性能 multimodal
20 Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning 提出DPLNet以解决多模态语义分割训练效率低的问题 multimodal
21 Zero-Shot Video Question Answering with Procedural Programs 提出ProViQ以解决视频零-shot问答问题 large language model multimodal
22 WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models 提出WeGeFT以实现大型模型的高效适应 instruction following
23 RadioGalaxyNET: Dataset and Novel Computer Vision Algorithms for the Detection of Extended Radio Galaxies and Infrared Hosts 提出RadioGalaxyNET以自动检测扩展射电星系及其红外宿主 multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
24 TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models 提出TrackDiffusion以解决视频生成中的动态控制问题 manipulation world model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页