cs.CV(2024-07-06)
📊 共 9 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱二:RL算法与架构 (RL & Architecture) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations | 提出MEA模型,通过学习模态专属和模态无关表示解决异步多模态视频序列融合问题。 | multimodal | ||
| 2 | OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding | OmChat:一种训练具备强大长文本和视频理解能力的多模态语言模型的方法 | multimodal | ||
| 3 | Completed Feature Disentanglement Learning for Multimodal MRIs Analysis | 提出完整特征解耦学习方法,用于提升多模态MRI分析的性能。 | multimodal | ||
| 4 | SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding | SHINE:提出显著性分层负例排序方法,提升组合时序定位的泛化能力 | large language model | ✅ | |
| 5 | The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge | 提出结合视觉和文本提示的联合预测方法,解决零样本指代表达式理解问题 | multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction | SurgicalGaussian:用于高保真手术场景重建的可变形3D高斯模型 | 3D gaussian splatting gaussian splatting splatting | ||
| 7 | Test-time Contrastive Concepts for Open-world Semantic Segmentation with Vision-Language Models | 提出测试时对比概念方法,解决视觉-语言模型在开放世界语义分割中单概念分割难题。 | open-vocabulary open vocabulary |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | The Solution for Language-Enhanced Image New Category Discovery | 提出伪视觉提示,增强文本标签的视觉表征能力,用于语言增强的图像新类别发现。 | contrastive learning large language model |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | CLIPVQA:Video Quality Assessment via CLIP | 提出基于CLIP的Transformer模型CLIPVQA,用于视频质量评估。 | spatiotemporal |