cs.CV(2024-09-21)
📊 共 14 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (3)
支柱六:视频提取与匹配 (Video Extraction) (2)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality | SplatLoc:基于3D高斯溅射的增强现实视觉定位方法 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 2 | MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors | MOSE:利用NeRF提升的单目语义重建,解决单目图像三维场景理解难题 | NeRF scene understanding | ||
| 3 | BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow | 提出BurstM以解决多帧超分辨率中的对齐问题 | optical flow | ✅ | |
| 4 | Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds | 提出多边级联网络MCNet,用于大规模室外点云语义分割 | scene understanding |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Enhancing Advanced Visual Reasoning Ability of Large Language Models | 提出CVR-LLM,增强大语言模型在复杂视觉推理任务中的能力 | large language model multimodal | ||
| 6 | Foundation Models for Amodal Video Instance Segmentation in Automated Driving | 提出S-AModal,利用Foundation Model解决自动驾驶中Amodal视频实例分割问题 | foundation model | ✅ | |
| 7 | Vision-Language Models Assisted Unsupervised Video Anomaly Detection | 提出VLAVAD,利用视觉-语言模型辅助无监督视频异常检测,在ShanghaiTech数据集上取得SOTA。 | large language model | ||
| 8 | SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information | 提出SURf框架,提升大型视觉语言模型对检索信息的选择性利用能力 | multimodal | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | CUS3D :CLIP-based Unsupervised 3D Segmentation via Object-level Denoise | CUS3D:提出基于CLIP和对象级去噪的无监督3D语义分割方法 | distillation open-vocabulary open vocabulary | ||
| 10 | BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance | BrainDreamer:通过语言引导,从脑电信号生成推理连贯且可控的图像 | dreamer contrastive learning | ||
| 11 | ECHO: Environmental Sound Classification with Hierarchical Ontology-guided Semi-Supervised Learning | ECHO:利用层级本体引导的半监督学习进行环境声音分类 | contrastive learning large language model |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture | PoseAugment:提出基于物理约束的生成式人体姿态增强方法,提升IMU动作捕捉精度。 | IMU-based motion human motion | ||
| 13 | Egocentric zone-aware action recognition across environments | 提出区域感知动作识别方法,提升跨环境下的自中心视角动作识别性能 | egocentric egocentric vision |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | ExFMan: Rendering 3D Dynamic Humans with Hybrid Monocular Blurry Frames and Events | ExFMan:利用混合单目模糊帧和事件相机数据渲染动态3D人体 | human motion |