cs.CV(2024-11-27)
📊 共 49 篇论文 | 🔗 13 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (13 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (11 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (10 🔗5)
支柱一:机器人控制 (Robot Control) (7 🔗2)
支柱四:生成式动作 (Generative Motion) (4)
支柱八:物理动画 (Physics-based Animation) (2 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (13 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting | 提出GS$^3$框架,通过3D高斯溅射加速点云无监督预训练,提升效率并降低内存占用。 | 3D gaussian splatting gaussian splatting splatting | ||
| 2 | HEMGS: A Hybrid Entropy Model for 3D Gaussian Splatting Data Compression | 提出HEMGS混合熵模型,用于高效压缩3D高斯溅射数据,显著降低存储空间。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 3 | GLS: Geometry-aware 3D Language Gaussian Splatting | GLS:基于几何感知的3D语言高斯溅射,实现表面重建与开放词汇分割的统一框架 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 4 | Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation | 提出Helvipad数据集,用于全景立体深度估计,并改进模型性能。 | depth estimation stereo depth | ||
| 5 | From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects | 提出OWEL和MSCAL,使开放词汇物体检测模型具备开放世界物体检测能力 | open-vocabulary open vocabulary | ||
| 6 | Textured Gaussians for Enhanced 3D Scene Appearance Modeling | 提出纹理高斯以增强3D场景外观建模 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 7 | GaussianSpeech: Audio-Driven Gaussian Avatars | GaussianSpeech:提出基于3D高斯溅射的音频驱动高逼真度人头化身 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 8 | Reconstructing Animals and the Wild | 提出RAW框架,利用大型语言模型先验知识重建野生动物及其自然场景 | scene understanding large language model | ||
| 9 | CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models | CAT4D:利用多视角视频扩散模型实现任意4D场景创建 | scene reconstruction TAMP | ✅ | |
| 10 | SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images | SmileSplat:提出一种可泛化的高斯溅射方法,用于无约束稀疏图像的三维重建。 | gaussian splatting splatting | ✅ | |
| 11 | An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition | 提出基于RGB流和表征流的双流网络,用于端到端的人类行为识别,降低计算成本。 | optical flow egocentric | ||
| 12 | MotionCharacter: Fine-Grained Motion Controllable Human Video Generation | MotionCharacter:提出细粒度运动可控的人体视频生成框架,解决运动强度控制难题。 | optical flow | ||
| 13 | RoMo: Robust Motion Segmentation Improves Structure from Motion | RoMo:稳健的运动分割提升了基于动态场景的SfM相机标定效果 | optical flow |
🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)
🔬 支柱一:机器人控制 (Robot Control) (7 篇)
🔬 支柱四:生成式动作 (Generative Motion) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 42 | OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains | 提出OOD-HOI框架,解决文本驱动的3D全身人-物交互生成在域外泛化性问题 | physically plausible human-object interaction HOI | ||
| 43 | Lifting Motion to the 3D World via 2D Diffusion | MVLift:利用2D扩散模型,仅通过2D数据学习3D运动估计 | motion diffusion model motion diffusion human-object interaction | ||
| 44 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | 提出XR-MBT,利用自监督学习和深度点云配准实现XR设备中的多模态全身追踪。 | motion synthesis egocentric | ||
| 45 | PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion | PersonaCraft:基于遮挡感知3D条件扩散的多人全身个性化场景生成 | classifier-free guidance SMPL-X |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 46 | Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling | 提出时空跳跃引导(STG)方法,提升视频扩散模型的采样质量,无需额外训练。 | spatiotemporal | ✅ | |
| 47 | EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond | EventCrab:融合帧和点信息的事件相机动作识别框架 | spatiotemporal |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 48 | VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis | 提出VLM-HOI,利用视觉语言模型进行可解释的人-物交互分析 | human-object interaction HOI |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 49 | HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation | HyperGLM:利用超图增强多模态LLM,实现视频场景图生成与预测 | egocentric spatial relationship multimodal |