cs.CV(2024-08-09)

📊 共 20 篇论文

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8) 支柱三:空间感知与语义 (Perception & Semantics) (5) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 VITA: Towards Open-Source Interactive Omni Multimodal LLM VITA:首个开源交互式全模态多模态大语言模型,支持视频、图像、文本和音频同步处理与交互。 large language model multimodal
2 Instruction Tuning-free Visual Token Complement for Multimodal LLMs 提出免指令调优的视觉令牌补充框架,提升多模态LLM的视觉信息利用率 large language model multimodal
3 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models mPLUG-Owl3:面向多模态大语言模型中的长图像序列理解 large language model multimodal
4 Weak-Annotation of HAR Datasets using Vision Foundation Models 提出基于视觉基础模型的弱监督HAR数据集标注方法,降低人工标注成本。 foundation model
5 TrajFM: A Vehicle Trajectory Foundation Model for Region and Task Transferability 提出TrajFM车辆轨迹基础模型,实现区域和任务间的迁移学习。 foundation model
6 ChatGPT Meets Iris Biometrics 利用ChatGPT进行虹膜识别:探索大语言模型在生物特征识别中的潜力 large language model multimodal
7 Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation Loc4Plan:面向室外视觉语言导航,定位先于规划 VLN
8 On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey 针对零样本图像识别中的元素级表示与推理进行系统性综述 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
9 In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation 提出惰性视觉Grounding,用于开放词汇语义分割,无需额外训练。 open-vocabulary open vocabulary visual grounding
10 ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation 提出ProxyCLIP以解决开放词汇语义分割问题 open-vocabulary open vocabulary foundation model
11 Spherical World-Locking for Audio-Visual Localization in Egocentric Videos 提出球面世界锁定(SWL)框架,用于自中心视频中的多模态音视频定位。 scene understanding egocentric
12 AugGS: Self-augmented Gaussians with Structural Masks for Sparse-view 3D Reconstruction AugGS:利用结构化掩码的自增强高斯模型,解决稀疏视角下的3D重建问题 gaussian splatting splatting
13 FewShotNeRF: Meta-Learning-based Novel View Synthesis for Rapid Scene-Specific Adaptation FewShotNeRF:基于元学习的快速场景自适应新视角合成 NeRF neural radiance field

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
14 FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow 提出FlowDreamer以解决文本到3D生成中的过平滑问题 dreamer distillation 3D gaussian splatting
15 Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery 提出Surgical-VQLA++,通过对抗对比学习实现手术机器人视觉问答定位的校准鲁棒性。 contrastive learning multimodal
16 Clustering-friendly Representation Learning for Enhancing Salient Features 提出聚类友好的对比学习方法,增强图像聚类任务中的显著特征表示 representation learning contrastive learning
17 UNIC: Universal Classification Models via Multi-teacher Distillation 提出UNIC,通过多教师蒸馏学习通用分类模型,提升跨任务泛化能力。 distillation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
18 LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description 提出LLaVA-VSD,用于视觉空间关系的分类、描述和开放式描述任务。 spatial relationship large language model multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
19 A Recurrent YOLOv8-based framework for Event-Based Object Detection 提出基于循环YOLOv8的事件相机目标检测框架ReYOLOv8,提升在高速运动和极端光照条件下的检测性能。 spatiotemporal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
20 One Shot is Enough for Sequential Infrared Small Target Segmentation 提出一种单样本无训练的红外小目标序列分割方法,有效利用SAM的泛化能力。 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页