cs.CV(2024-05-30)
📊 共 46 篇论文 | 🔗 12 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (13 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (12 🔗4)
支柱二:RL算法与架构 (RL & Architecture) (9 🔗5)
支柱一:机器人控制 (Robot Control) (5)
支柱六:视频提取与匹配 (Video Extraction) (5)
支柱四:生成式动作 (Generative Motion) (2)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (12 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction | GaussianRoom:结合SDF引导和单目线索,提升3D高斯溅射在室内场景重建效果 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 15 | $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving | 提出自监督街景高斯方法,无需3D标注实现自动驾驶场景的动态静态元素分解。 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 16 | OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation | 提出OpenDAS,通过开放词汇域自适应提升2D/3D分割性能 | open-vocabulary open vocabulary | ✅ | |
| 17 | RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection | RTGen:生成区域-文本对,提升开放词汇目标检测性能 | open-vocabulary open vocabulary | ||
| 18 | EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos | 提出EMAG,解决以自我为中心的视频中手部动作预测的视角依赖和泛化性问题 | optical flow egocentric Ego4D | ✅ | |
| 19 | IReNe: Instant Recoloring of Neural Radiance Fields | IReNe:实现神经辐射场的即时颜色重着色,提升编辑效率与真实感。 | NeRF neural radiance field scene reconstruction | ||
| 20 | Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian | 提出UGOT方法,利用不确定性引导的最优传输解决稀疏视角3D高斯重建问题 | depth estimation monocular depth 3D gaussian splatting | ||
| 21 | A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction | 提出分层 Splatter Image 方法,利用多高斯模型提升单视角3D重建中遮挡区域的建模能力。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 22 | View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields | 提出基于超度量特征场的3D一致性分层分割方法,解决视角不一致问题。 | NeRF neural radiance field foundation model | ✅ | |
| 23 | TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes | 提出TetSphere Splatting,利用四面体网格实现高质量3D形状建模。 | splatting | ||
| 24 | Gated Fields: Learning Scene Reconstruction from Gated Videos | 提出Gated Fields,利用主动门控视频序列实现室外场景的精确3D重建 | scene reconstruction | ||
| 25 | CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets | CLAY:一种可控的大规模生成模型,用于创建高质量3D资产 | implicit representation |
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
🔬 支柱一:机器人控制 (Robot Control) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 35 | SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation | SAM-E:利用视觉基础模型和序列模仿进行具身操作 | manipulation scene understanding foundation model | ||
| 36 | May the Dance be with You: Dance Generation Framework for Non-Humanoids | 提出一种非人形智能体舞蹈生成框架,通过视觉节奏与音乐的关联学习舞蹈动作。 | humanoid reinforcement learning contrastive learning | ||
| 37 | Learning 3D Robotics Perception using Inductive Priors | 利用归纳偏置学习3D机器人感知,提升泛化性和降低数据依赖。 | sim2real scene understanding semantic map | ||
| 38 | HINT: Learning Complete Human Neural Representations from Limited Viewpoints | HINT:提出一种基于NeRF的人体神经表示学习方法,解决有限视角下完整人体建模问题。 | humanoid NeRF | ||
| 39 | ParSEL: Parameterized Shape Editing with Language | ParSEL:提出一种基于语言的参数化形状编辑方法,实现对3D资产的可控编辑。 | manipulation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 40 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | 提出MotionLLM以解决多模态人类行为理解问题 | SMPL human motion large language model | ||
| 41 | SMPLX-Lite: A Realistic and Drivable Avatar Benchmark with Rich Geometry and Texture Annotations | 提出SMPLX-Lite数据集和参数化模型,用于驱动逼真且可控的全身虚拟化身 | SMPL-X human motion | ||
| 42 | Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera | 提出基于360度第一视角视频的视觉问答数据集,辅助视觉障碍人士。 | egocentric | ||
| 43 | OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer | OmniHands:通过通用Transformer实现鲁棒的4D手部网格重建 | hand reconstruction | ||
| 44 | Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models | 提出PlausiVL,利用视频-语言大模型进行符合现实的动作序列预测。 | Ego4D |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 45 | RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text | RapVerse:提出一种从文本生成连贯歌声和全身动作的统一框架 | motion generation human motion multimodal | ||
| 46 | Stratified Avatar Generation from Sparse Observations | 提出分层生成方法,从稀疏观测中重建全身虚拟化身 | VQ-VAE SMPL |