cs.CV（2025-04-09）

📊 共 7 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (2) 支柱三：空间感知与语义 (Perception & Semantics) (1 🔗1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging	Zeus：面向多模态医学影像联合分割的零样本LLM指令学习框架	large language model multimodal
2	Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning	Face-LLaVA：通过指令微调实现面部表情和属性理解的多模态大语言模型	large language model multimodal
3	Are We Done with Object-Centric Learning?	利用分割模型实现目标中心化学习，并提出OCCAM探究其泛化能力	foundation model	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
4	Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer	提出PEKA框架，通过参数高效的知识迁移，提升病理图像基因表达预测精度。	distillation foundation model
5	Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection	提出基于嵌入边信息的广义语义对比学习，用于小样本目标检测	representation learning contrastive learning

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
6	FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution	FlashDepth：实时2K分辨率流视频深度估计	depth estimation	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
7	Perception in Reflection	提出反射感知（RePer）框架，提升大型视觉语言模型（LVLMs）的感知能力。	manipulation multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页