cs.CV(2024-10-10)
📊 共 27 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (9 🔗5)
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱四:生成式动作 (Generative Motion) (2 🔗1)
支柱一:机器人控制 (Robot Control) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting | MotionGS:提出显式运动引导的可变形3D高斯溅射方法,用于动态场景重建 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 2 | Poison-splat: Computation Cost Attack on 3D Gaussian Splatting | 提出Poison-splat攻击,揭示3D高斯溅射训练过程中的计算成本安全漏洞 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 3 | Fast Feedforward 3D Gaussian Splatting Compression | 提出FCGS,一种快速前馈的3D高斯溅射压缩方法,无需逐场景优化。 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 4 | IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera | IncEventGS:单事件相机下的无位姿高斯溅射重建 | visual odometry 3D gaussian splatting gaussian splatting | ✅ | |
| 5 | Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics | 提出神经材质适配器NeuMA,用于本征动力学的视觉基准 | 3D gaussian splatting gaussian splatting splatting | ||
| 6 | DifFRelight: Diffusion-Based Facial Performance Relighting | 提出基于扩散模型的面部表演重打光框架,实现自由视点下的高保真光照控制 | 3D gaussian splatting gaussian splatting splatting | ||
| 7 | A transition towards virtual representations of visual scenes | 提出一种面向3D虚拟合成的视觉场景理解架构,实现统一灵活的场景描述。 | scene understanding | ||
| 8 | Generalizable and Animatable Gaussian Head Avatar | 提出GAGAvatar,通过单张图像生成可泛化和可动画的高斯头部头像。 | neural radiance field | ✅ | |
| 9 | Test-Time Intensity Consistency Adaptation for Shadow Detection | 提出TICA框架以解决阴影检测中的一致性问题 | scene understanding |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Mono-InternVL:通过内生视觉预训练提升单体多模态大语言模型性能 | visual pre-training large language model multimodal | ✅ | |
| 20 | SPA: 3D Spatial-Awareness Enables Effective Embodied Representation | SPA:通过3D空间感知增强具身智能的有效表征学习 | representation learning embodied AI language conditioned | ✅ | |
| 21 | LaB-CL: Localized and Balanced Contrastive Learning for improving parking slot detection | 提出LaB-CL框架,通过局部平衡对比学习提升泊车位检测性能 | contrastive learning | ||
| 22 | MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction | MGMapNet:用于端到端矢量化高清地图构建的多粒度表示学习 | representation learning |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | MMHead: Towards Fine-grained Multi-modal 3D Facial Animation | MMHead:构建多模态3D面部动画数据集,并提出文本驱动的动画生成方法。 | motion generation VQ-VAE | ||
| 24 | Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos | 提出基于神经卡尔曼滤波的物理人体运动捕捉方法,提升运动平滑性和物理真实性。 | physically plausible | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network | 提出金字塔图卷积网络PGCN,用于理解人机交互中的时空关系,实现动作识别与分割。 | bi-manual human-object interaction |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | SeMv-3D: Towards Concurrency of Semantic and Multi-view Consistency in General Text-to-3D Generation | SeMv-3D:面向通用文本到3D生成,实现语义与多视角一致性的协同优化 | geometric consistency |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | ToMiE: Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars | ToMiE:提出显式外骨骼方法,用于重建复杂3D人体Avatar | SMPL |