cs.CV(2024-10-31)
📊 共 29 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (12)
支柱三:空间感知与语义 (Perception & Semantics) (8 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱四:生成式动作 (Generative Motion) (2 🔗2)
支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)
支柱一:机器人控制 (Robot Control) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting | 提出GaussianMarker,实现3D高斯溅射模型的版权保护与隐形水印嵌入。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 14 | ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images | ImOV3D:仅用2D图像学习开放词汇3D点云目标检测 | depth estimation monocular depth open-vocabulary | ✅ | |
| 15 | Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis | 提出自集成高斯溅射(SE-GS),解决少样本新视角合成中的过拟合问题。 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 16 | GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering | GeoSplatting:通过几何引导的高斯溅射实现基于物理的逆渲染 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 17 | Optical Lens Attack on Monocular Depth Estimation for Autonomous Driving | 提出LensAttack以解决单目深度估计的安全隐患 | depth estimation monocular depth | ||
| 18 | Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes | 提出Aquatic-GS水下混合3D表示方法,有效建模水体和物体,实现高质量渲染与复原。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 19 | GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring | 提出GS-Blur:基于3D高斯溅射的真实图像去模糊数据集 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 20 | XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM | XRDSLAM:一个灵活且模块化的深度学习SLAM框架,易于扩展和评估。 | 3DGS NeRF |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment | JEMA:一种用于多模态对齐可扩展协同学习的联合嵌入框架 | contrastive learning multimodal | ||
| 22 | MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation | 提出MLLA-UNet,结合线性注意力与Mamba机制,高效解决医学图像分割难题。 | Mamba linear attention | ✅ | |
| 23 | NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs | NIMBA:利用SSM实现点云鲁棒且有原则的处理 | Mamba SSM state space model | ||
| 24 | Semantic Knowledge Distillation for Onboard Satellite Earth Observation Image Classification | 提出动态加权知识蒸馏框架,用于资源受限的卫星遥感图像高效分类。 | distillation |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | Fashion-VDM: Video Diffusion Model for Virtual Try-On | Fashion-VDM:用于虚拟试穿视频生成的视频扩散模型 | classifier-free guidance | ✅ | |
| 26 | Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning | DEMO:通过解耦编码与条件化增强文本到视频生成中的运动效果 | motion synthesis | ✅ |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection | 提出EZ-HOI,通过引导式Prompt学习实现零样本HOI检测中的VLM自适应 | human-object interaction HOI large language model | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization | 提出HiFi-Net++,利用语言引导的分层细粒度方法解决图像伪造检测与定位问题 | manipulation representation learning |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | DELTA: Dense Efficient Long-range 3D Tracking for any video | DELTA:一种高效的密集长程3D跟踪方法,适用于任意视频。 | motion tracking |