cs.CV(2025-01-24)
📊 共 17 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (5)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Leveraging ChatGPT's Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research | 利用ChatGPT多模态视觉能力,通过卫星图像评估贫困程度,推进社会科学研究工具。 | large language model multimodal | ||
| 9 | Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection | 提出MUSE模型,通过三路径增强神经架构搜索解决多模态假新闻检测问题。 | multimodal | ||
| 10 | Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing | 提出融合地图信息的遥感图像-文本数据集生成方法,缓解幻觉问题。 | large language model multimodal | ||
| 11 | Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models | 提出GSWA模块,为高分辨率LVLM中的子图像动态分配语义权重,提升视觉理解能力。 | multimodal | ||
| 12 | Dynamic Token Reduction during Generation for Vision Language Models | 提出动态速率(DyRate)方法,解决视觉语言模型生成过程中视觉token冗余问题。 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | HERMES:用于同步3D场景理解与生成的统一自动驾驶世界模型 | world model scene understanding large language model | ✅ | |
| 14 | Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation | 提出基于Jaccard距离条件对比学习和上下文视觉增强的多模态实体链接方法 | contrastive learning multimodal | ||
| 15 | Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation | 提出Surface Vision Mamba,用于高效球面流形表示和神经发育表型回归。 | Mamba state space model | ✅ | |
| 16 | Dreamweaver: Learning Compositional World Models from Pixels | Dreamweaver:提出一种从像素学习组合世界模型的方法,用于视频分解和未来预测。 | world model |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations | 提出ReferDINO以解决视频目标分割中的视觉引导问题 | spatiotemporal visual grounding |