cs.CV(2024-07-27)

📊 共 8 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (1 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
1 LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models LLaVA-Read:通过双视觉编码器和视觉文本编码器增强多模态语言模型的阅读能力 large language model multimodal
2 Data Processing Techniques for Modern Multimodal Models 综述现代多模态模型的数据处理技术,聚焦扩散模型与多模态大语言模型 large language model multimodal
3 Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification on the DAIC-WOZ 提出融合LLM的三模态BiLSTM架构,用于DAIC-WOZ抑郁症自动分类。 large language model
4 Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble MEFormer:通过模态无关解码和邻近度模态集成,实现鲁棒的多模态3D目标检测 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
5 Revisit Self-supervised Depth Estimation with Local Structure-from-Motion 提出基于局部SfM的自监督深度估计方法,提升深度和对应关系模型的性能。 depth estimation NeRF
6 RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry RePLAy:利用极几何消除投影LiDAR深度图伪影,提升单目深度估计和3D目标检测。 monocular depth

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
7 Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration 提出简化注册方法以解决图像配准精度不足问题 Mamba

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
8 Symmetrical Joint Learning Support-query Prototypes for Few-shot Segmentation 提出Sym-Net,通过对称联合学习支持集-查询集原型解决少样本分割中的类内差异问题 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页