cs.CV（2024-08-09）

📊 共 20 篇论文

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8) 支柱三：空间感知与语义 (Perception & Semantics) (5) 支柱二：RL算法与架构 (RL & Architecture) (4) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	VITA: Towards Open-Source Interactive Omni Multimodal LLM	VITA：首个开源交互式全模态多模态大语言模型，支持视频、图像、文本和音频同步处理与交互。	large language model multimodal
2	Instruction Tuning-free Visual Token Complement for Multimodal LLMs	提出免指令调优的视觉令牌补充框架，提升多模态LLM的视觉信息利用率	large language model multimodal
3	mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models	mPLUG-Owl3：面向多模态大语言模型中的长图像序列理解	large language model multimodal
4	Weak-Annotation of HAR Datasets using Vision Foundation Models	提出基于视觉基础模型的弱监督HAR数据集标注方法，降低人工标注成本。	foundation model
5	TrajFM: A Vehicle Trajectory Foundation Model for Region and Task Transferability	提出TrajFM车辆轨迹基础模型，实现区域和任务间的迁移学习。	foundation model
6	ChatGPT Meets Iris Biometrics	利用ChatGPT进行虹膜识别：探索大语言模型在生物特征识别中的潜力	large language model multimodal
7	Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation	Loc4Plan：面向室外视觉语言导航，定位先于规划	VLN
8	On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey	针对零样本图像识别中的元素级表示与推理进行系统性综述	foundation model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
9	In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation	提出惰性视觉Grounding，用于开放词汇语义分割，无需额外训练。	open-vocabulary open vocabulary visual grounding
10	ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation	提出ProxyCLIP以解决开放词汇语义分割问题	open-vocabulary open vocabulary foundation model
11	Spherical World-Locking for Audio-Visual Localization in Egocentric Videos	提出球面世界锁定(SWL)框架，用于自中心视频中的多模态音视频定位。	scene understanding egocentric
12	AugGS: Self-augmented Gaussians with Structural Masks for Sparse-view 3D Reconstruction	AugGS：利用结构化掩码的自增强高斯模型，解决稀疏视角下的3D重建问题	gaussian splatting splatting
13	FewShotNeRF: Meta-Learning-based Novel View Synthesis for Rapid Scene-Specific Adaptation	FewShotNeRF：基于元学习的快速场景自适应新视角合成	NeRF neural radiance field

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
14	FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow	提出FlowDreamer以解决文本到3D生成中的过平滑问题	dreamer distillation 3D gaussian splatting
15	Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery	提出Surgical-VQLA++，通过对抗对比学习实现手术机器人视觉问答定位的校准鲁棒性。	contrastive learning multimodal
16	Clustering-friendly Representation Learning for Enhancing Salient Features	提出聚类友好的对比学习方法，增强图像聚类任务中的显著特征表示	representation learning contrastive learning
17	UNIC: Universal Classification Models via Multi-teacher Distillation	提出UNIC，通过多教师蒸馏学习通用分类模型，提升跨任务泛化能力。	distillation

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description	提出LLaVA-VSD，用于视觉空间关系的分类、描述和开放式描述任务。	spatial relationship large language model multimodal

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	A Recurrent YOLOv8-based framework for Event-Based Object Detection	提出基于循环YOLOv8的事件相机目标检测框架ReYOLOv8，提升在高速运动和极端光照条件下的检测性能。	spatiotemporal

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	One Shot is Enough for Sequential Infrared Small Target Segmentation	提出一种单样本无训练的红外小目标序列分割方法，有效利用SAM的泛化能力。	feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页