cs.CV（2024-05-08）

📊 共 7 篇论文

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (2) 支柱九：具身大模型 (Embodied Foundation Models) (2) 支柱四：生成式动作 (Generative Motion) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱三：空间感知与语义 (Perception & Semantics) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
1	OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies	提出OpenESS，利用图像-文本知识迁移实现开放词汇的事件语义场景理解。	distillation scene understanding open-vocabulary
2	GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields	提出基于梯度域高斯溅射的辐射场稀疏表示方法，提升渲染效率。	distillation 3D gaussian splatting gaussian splatting

🔬 支柱九：具身大模型 (Embodied Foundation Models) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
3	All in One Framework for Multimodal Re-identification in the Wild	提出AIO框架，利用预训练大模型实现统一的多模态ReID，解决模态异构问题。	foundation model multimodal
4	VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context	提出VisionGraph基准，并设计DPR链以提升LMMs在视觉图论问题上的推理能力。	multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
5	Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches	提出基于运动块和视觉Transformer的3D人体运动-语言模型，提升跨模态检索性能。	text-to-motion motion retrieval human motion

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving	提出LaserMix++框架，利用多模态数据高效提升自动驾驶3D场景理解能力	manipulation distillation scene understanding

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
7	SemiCD-VL: Visual-Language Model Guidance Makes Better Semi-supervised Change Detector	提出SemiCD-VL，利用视觉语言模型指导半监督变化检测，提升小样本性能。	open-vocabulary open vocabulary

⬅️ 返回 cs.CV 首页 · 🏠 返回主页