cs.CV(2024-07-31)
📊 共 19 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗2)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models | 提出ControlMLLM,通过无训练的视觉提示学习增强多模态大语言模型的指代能力 | large language model multimodal | ||
| 2 | Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM | 提出Chat2Layout,利用多模态LLM实现交互式3D家具布局生成。 | large language model multimodal | ||
| 3 | Learning Video Context as Interleaved Multimodal Sequences | 提出MovieSeq,通过交错多模态序列学习视频上下文,提升叙事视频理解能力。 | multimodal | ✅ | |
| 4 | Design and Development of Laughter Recognition System Based on Multimodal Fusion and Deep Learning | 提出基于多模态融合和深度学习的笑声识别系统,提升情感计算和人机交互能力 | multimodal | ||
| 5 | Pathology Foundation Models | 病理学Foundation Model综述:挑战与未来医学AI融合 | foundation model | ||
| 6 | Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs | 提出一种免训练方法PAI,通过增强图像关注度缓解LVLM中的幻觉问题 | large language model | ✅ | |
| 7 | Segment Anything for Videos: A Systematic Survey | 对视频领域SAM进行系统性综述,填补了现有图像领域综述的空白。 | foundation model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Expressive Whole-Body 3D Gaussian Avatar | 提出ExAvatar,一种基于单目视频学习的具有表情和手部动作的全身3D高斯人像 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 9 | MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection | MarvelOVD:融合目标识别与视觉-语言模型,实现鲁棒的开放词汇目标检测 | open-vocabulary open vocabulary | ✅ | |
| 10 | A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap | 提出CEFA模块,弥合生成数据与真实数据域差异,提升罕见HOI检测性能 | scene understanding human-object interaction HOI | ✅ | |
| 11 | SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow | 提出SSRFlow,通过语义感知融合和时空重嵌入解决真实场景下的场景流估计问题。 | scene flow spatiotemporal | ||
| 12 | EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching | EMatch:统一事件相机光流和立体匹配的框架,实现跨任务知识迁移 | optical flow | ||
| 13 | Leveraging Adaptive Implicit Representation Mapping for Ultra High-Resolution Image Segmentation | 提出自适应隐式表示映射,用于超高分辨率图像分割。 | implicit representation |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Localized Gaussian Splatting Editing with Contextual Awareness | 提出上下文感知局部高斯溅射编辑方法,实现光照一致的三维场景编辑。 | distillation 3D gaussian splatting 3DGS | ||
| 15 | RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining | 提出RainMamba,利用改进的状态空间模型增强视频去雨的局部信息学习。 | Mamba SSM state space model | ✅ | |
| 16 | ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images | 提出ESIQAnet,用于评估基于Vision Pro的注视点空间图像的感知质量 | Mamba egocentric | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification | 提出结合ViT与SVM的可解释迁移学习方法,用于高效识别马铃薯干旱胁迫 | spatial relationship |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Analyzing the impact of semantic LoD3 building models on image-based vehicle localization | 利用语义LoD3建筑模型增强图像车辆定位精度,解决城市峡谷GNSS信号弱问题 | feature matching |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | PEAR: Phrase-Based Hand-Object Interaction Anticipation | 提出PEAR模型,联合预测手-物交互意图与操作,提升具身智能。 | manipulation |