cs.CV(2024-08-09)
📊 共 20 篇论文
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (8)
支柱三:空间感知与语义 (Perception & Semantics) (5)
支柱二:RL算法与架构 (RL & Architecture) (4)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | 提出惰性视觉Grounding,用于开放词汇语义分割,无需额外训练。 | open-vocabulary open vocabulary visual grounding | ||
| 10 | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | 提出ProxyCLIP以解决开放词汇语义分割问题 | open-vocabulary open vocabulary foundation model | ||
| 11 | Spherical World-Locking for Audio-Visual Localization in Egocentric Videos | 提出球面世界锁定(SWL)框架,用于自中心视频中的多模态音视频定位。 | scene understanding egocentric | ||
| 12 | AugGS: Self-augmented Gaussians with Structural Masks for Sparse-view 3D Reconstruction | AugGS:利用结构化掩码的自增强高斯模型,解决稀疏视角下的3D重建问题 | gaussian splatting splatting | ||
| 13 | FewShotNeRF: Meta-Learning-based Novel View Synthesis for Rapid Scene-Specific Adaptation | FewShotNeRF:基于元学习的快速场景自适应新视角合成 | NeRF neural radiance field |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow | 提出FlowDreamer以解决文本到3D生成中的过平滑问题 | dreamer distillation 3D gaussian splatting | ||
| 15 | Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery | 提出Surgical-VQLA++,通过对抗对比学习实现手术机器人视觉问答定位的校准鲁棒性。 | contrastive learning multimodal | ||
| 16 | Clustering-friendly Representation Learning for Enhancing Salient Features | 提出聚类友好的对比学习方法,增强图像聚类任务中的显著特征表示 | representation learning contrastive learning | ||
| 17 | UNIC: Universal Classification Models via Multi-teacher Distillation | 提出UNIC,通过多教师蒸馏学习通用分类模型,提升跨任务泛化能力。 | distillation |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description | 提出LLaVA-VSD,用于视觉空间关系的分类、描述和开放式描述任务。 | spatial relationship large language model multimodal |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | A Recurrent YOLOv8-based framework for Event-Based Object Detection | 提出基于循环YOLOv8的事件相机目标检测框架ReYOLOv8,提升在高速运动和极端光照条件下的检测性能。 | spatiotemporal |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | One Shot is Enough for Sequential Infrared Small Target Segmentation | 提出一种单样本无训练的红外小目标序列分割方法,有效利用SAM的泛化能力。 | feature matching |