cs.CV(2025-04-09)
📊 共 7 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (2)
支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging | Zeus:面向多模态医学影像联合分割的零样本LLM指令学习框架 | large language model multimodal | ||
| 2 | Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning | Face-LLaVA:通过指令微调实现面部表情和属性理解的多模态大语言模型 | large language model multimodal | ||
| 3 | Are We Done with Object-Centric Learning? | 利用分割模型实现目标中心化学习,并提出OCCAM探究其泛化能力 | foundation model | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer | 提出PEKA框架,通过参数高效的知识迁移,提升病理图像基因表达预测精度。 | distillation foundation model | ||
| 5 | Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection | 提出基于嵌入边信息的广义语义对比学习,用于小样本目标检测 | representation learning contrastive learning |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | FlashDepth:实时2K分辨率流视频深度估计 | depth estimation | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Perception in Reflection | 提出反射感知(RePer)框架,提升大型视觉语言模型(LVLMs)的感知能力。 | manipulation multimodal |