cs.CV(2024-08-20)
📊 共 30 篇论文 | 🔗 11 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (11 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (6 🔗2)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting | GS-CPR:利用3D高斯溅射实现高效相机姿态优化 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 2 | OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding | 提出OpenScan基准,用于广义开放词汇3D场景理解 | scene understanding open-vocabulary open vocabulary | ||
| 3 | Near, far: Patch-ordering enhances vision foundation models' scene understanding | 提出NeCo损失函数,通过patch排序增强视觉基础模型场景理解能力 | scene understanding foundation model | ||
| 4 | SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition | 利用Conv-Attention增强Emotion-LLaMA,提升多模态情感识别性能 | open-vocabulary open vocabulary multimodal | ✅ | |
| 5 | On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes | 评估开放词汇模型在异常街景目标检测中的潜力,揭示其在开放世界场景下的局限性。 | open-vocabulary open vocabulary | ||
| 6 | Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection | 提出UniProj-Det,一种轻量级模块化参数高效的开放词汇目标检测框架 | open-vocabulary open vocabulary | ||
| 7 | TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks | TrackNeRF:通过特征轨迹进行NeRF的Bundle Adjustment,解决稀疏和噪声视角下的重建问题 | NeRF neural radiance field | ||
| 8 | DEGAS: Detailed Expressions on Full-Body Gaussian Avatars | 提出DEGAS以解决全身高斯头像中细致表情建模问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 9 | Open 3D World in Autonomous Driving | 提出一种融合3D点云与文本信息的开放词汇自动驾驶感知方法 | open-vocabulary open vocabulary multimodal | ||
| 10 | Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant | 提出PoVo,首个无需词汇表的3D实例分割方法,利用视觉-语言助手实现开放场景理解。 | open-vocabulary open vocabulary | ✅ | |
| 11 | PooDLe: Pooled and dense self-supervised learning from naturalistic videos | PooDLe:结合池化与密集自监督学习,从自然视频中学习表征 | optical flow |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining | 提出SenPa-MAE,用于多卫星遥感影像自监督预训练,解决跨传感器数据融合问题。 | masked autoencoder MAE foundation model | ||
| 20 | ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining | ShapeSplat:大规模高斯溅射数据集及其自监督预训练 | representation learning MAE 3D gaussian splatting | ||
| 21 | MambaEVT: Event Stream based Visual Object Tracking using State Space Model | 提出基于Mamba状态空间模型的事件流视觉目标跟踪框架MambaEVT | Mamba state space model | ✅ | |
| 22 | MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval | 提出MUSE:一种基于Mamba的高效多尺度文本视频检索模型 | Mamba | ||
| 23 | Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers | 提出基于可解释Vision Transformer的自适应知识蒸馏方法,用于手部图像分类。 | distillation | ||
| 24 | Event Stream-based Sign Language Translation: A High-Definition Benchmark Dataset and A Novel Baseline | 提出Event-CSL事件流手语翻译数据集和EvSLT基线模型,解决光照和隐私问题。 | Mamba spatiotemporal | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics | PartGS:提出一种自监督混合表示学习框架,用于三维场景的部件级解析与重建。 | manipulation NeRF | ||
| 26 | A Gray-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse | 提出后验坍塌攻击PCA,保护图像免受基于LDM的未经授权编辑。 | manipulation | ✅ |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | A Review of Human-Object Interaction Detection | 综述图像中人-物交互检测方法,分析挑战与未来趋势。 | human-object interaction HOI |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | Multi-view Hand Reconstruction with a Point-Embedded Transformer | 提出POEM模型,利用点嵌入Transformer实现通用多视角手部网格重建 | HMR hand reconstruction | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network | 提出CrossFi,一种基于孪生网络的跨域Wi-Fi感知框架,解决领域迁移问题。 | penetration | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 30 | A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning | 提出基于热立体视觉和深度学习的非接触式波浪测量技术 | spatiotemporal |