cs.CV(2024-07-31)

📊 共 19 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models 提出ControlMLLM,通过无训练的视觉提示学习增强多模态大语言模型的指代能力 large language model multimodal
2 Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM 提出Chat2Layout,利用多模态LLM实现交互式3D家具布局生成。 large language model multimodal
3 Learning Video Context as Interleaved Multimodal Sequences 提出MovieSeq,通过交错多模态序列学习视频上下文,提升叙事视频理解能力。 multimodal
4 Design and Development of Laughter Recognition System Based on Multimodal Fusion and Deep Learning 提出基于多模态融合和深度学习的笑声识别系统,提升情感计算和人机交互能力 multimodal
5 Pathology Foundation Models 病理学Foundation Model综述:挑战与未来医学AI融合 foundation model
6 Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs 提出一种免训练方法PAI,通过增强图像关注度缓解LVLM中的幻觉问题 large language model
7 Segment Anything for Videos: A Systematic Survey 对视频领域SAM进行系统性综述,填补了现有图像领域综述的空白。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
8 Expressive Whole-Body 3D Gaussian Avatar 提出ExAvatar,一种基于单目视频学习的具有表情和手部动作的全身3D高斯人像 3D gaussian splatting 3DGS gaussian splatting
9 MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection MarvelOVD:融合目标识别与视觉-语言模型,实现鲁棒的开放词汇目标检测 open-vocabulary open vocabulary
10 A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap 提出CEFA模块,弥合生成数据与真实数据域差异,提升罕见HOI检测性能 scene understanding human-object interaction HOI
11 SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow 提出SSRFlow,通过语义感知融合和时空重嵌入解决真实场景下的场景流估计问题。 scene flow spatiotemporal
12 EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching EMatch:统一事件相机光流和立体匹配的框架,实现跨任务知识迁移 optical flow
13 Leveraging Adaptive Implicit Representation Mapping for Ultra High-Resolution Image Segmentation 提出自适应隐式表示映射,用于超高分辨率图像分割。 implicit representation

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
14 Localized Gaussian Splatting Editing with Contextual Awareness 提出上下文感知局部高斯溅射编辑方法,实现光照一致的三维场景编辑。 distillation 3D gaussian splatting 3DGS
15 RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining 提出RainMamba,利用改进的状态空间模型增强视频去雨的局部信息学习。 Mamba SSM state space model
16 ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images 提出ESIQAnet,用于评估基于Vision Pro的注视点空间图像的感知质量 Mamba egocentric

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
17 An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification 提出结合ViT与SVM的可解释迁移学习方法,用于高效识别马铃薯干旱胁迫 spatial relationship

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
18 Analyzing the impact of semantic LoD3 building models on image-based vehicle localization 利用语义LoD3建筑模型增强图像车辆定位精度,解决城市峡谷GNSS信号弱问题 feature matching

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
19 PEAR: Phrase-Based Hand-Object Interaction Anticipation 提出PEAR模型,联合预测手-物交互意图与操作,提升具身智能。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页