cs.CV(2024-11-07)

📊 共 21 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱七:动作重定向 (Motion Retargeting) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 Discretized Gaussian Representation for Tomographic Reconstruction 提出离散高斯表示(DGR)用于高效高质量的CT断层重建 3D gaussian splatting gaussian splatting splatting
2 VAIR: Visuo-Acoustic Implicit Representations for Low-Cost, Multi-Modal Transparent Surface Reconstruction in Indoor Scenes 提出VAIR,利用视觉-声学隐式表示实现低成本室内透明表面重建 implicit representation latent optimization
3 Planar Reflection-Aware Neural Radiance Fields 提出反射感知神经辐射场,解决NeRF在平面反射建模中的缺陷。 NeRF neural radiance field
4 MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views MVSplat360:基于稀疏视角的360度场景前馈式合成方法 3D gaussian splatting 3DGS gaussian splatting
5 D$^3$epth: Self-Supervised Depth Estimation with Dynamic Mask in Dynamic Scenes D$^3$epth:提出动态掩码的自监督深度估计方法,解决动态场景下的深度估计难题。 depth estimation monocular depth
6 Enhancing Bronchoscopy Depth Estimation through Synthetic-to-Real Domain Adaptation 提出基于合成数据到真实数据域适应的支气管镜深度估计方法 depth estimation monocular depth
7 GANESH: Generalizable NeRF for Lensless Imaging GANESH:用于无透镜成像的可泛化NeRF,实现多视角图像的三维重建与优化 NeRF
8 Breaking The Ice: Video Segmentation for Close-Range Ice-Covered Waters 提出UPerFlow模型,用于近距离冰覆盖水域的视频分割,提升复杂冰况下的鲁棒性。 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
9 VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos VideoGLaMM:用于视频像素级视觉定位的大型多模态模型 large language model multimodal visual grounding
10 Explainable Search and Discovery of Visual Cultural Heritage Collections with Multimodal Large Language Models 利用多模态大语言模型实现视觉文化遗产集合的可解释搜索与发现 large language model multimodal
11 CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM CAD-MLLM:提出一种统一的多模态条件CAD生成框架,利用MLLM实现文本、图像、点云等多模态输入驱动的CAD模型生成。 large language model multimodal
12 Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion 提出BA-Fusion框架,解决多模态图像融合在动态亮度变化下的鲁棒性问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
13 ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing ProEdit:通过渐进式扩散蒸馏实现高质量3D场景编辑 distillation 3D gaussian splatting 3DGS
14 LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation LLM2CLIP:利用大型语言模型增强CLIP的视觉表征能力 contrastive learning large language model multimodal
15 A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model 提出基于预训练视觉-语言模型的强化学习自动视频剪辑方法,用于通用场景视频编辑。 reinforcement learning

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
16 LidaRefer: Context-aware Outdoor 3D Visual Grounding for Autonomous Driving LidaRefer:面向自动驾驶的上下文感知室外3D视觉定位 spatial relationship visual grounding
17 ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction ProGraph:时序对齐概率引导的图拓扑建模用于3D人体重建,解决遮挡和模糊问题。 human motion

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
18 HourVideo: 1-Hour Video-Language Understanding 提出HourVideo基准数据集,用于评估和推进1小时长视频的视频-语言理解能力。 egocentric Ego4D multimodal
19 Social EgoMesh Estimation 提出SEE-ME框架,利用社交交互信息提升自中心视角下的人体网格估计精度 egocentric

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
20 DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction DanceFusion:时空骨骼扩散Transformer用于音频驱动的舞蹈动作重建 motion generation

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion 提出DimensionX以解决单图生成3D和4D场景的问题 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页