cs.CV(2024-06-18)

📊 共 12 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
1 DrVideo: Document Retrieval Based Long Video Understanding DrVideo:提出一种基于文档检索的长视频理解框架,有效利用大语言模型。 large language model chain-of-thought
2 Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention 提出AGLA,通过组装全局和局部注意力缓解大型视觉语言模型中的对象幻觉问题 multimodal visual grounding
3 Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning 提出MIRB基准,用于评估视觉语言模型在多图理解中的感知、知识、推理和多跳推理能力。 large language model
4 Disturbing Image Detection Using LMM-Elicited Emotion Embeddings 利用LMM提取的情感嵌入进行扰乱图像检测 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
5 HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors HumanSplat:利用结构先验的通用单图像人体高斯溅射 3D gaussian splatting gaussian splatting splatting
6 Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models 提出基于潜在扩散模型的3D高斯场景快速生成方法 NeRF
7 GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models GeoBench:构建单目几何估计模型评测基准,揭示数据质量重要性。 depth estimation

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
8 Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation 提出基于透视蒸馏和无标签数据增强的360度单目深度估计方法 distillation depth estimation monocular depth
9 LFMamba: Light Field Image Super-Resolution with State Space Model 提出LFMamba以解决光场图像超分辨率中的长距离依赖问题 Mamba SSM state space model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
10 RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding 提出RS-GPT4V:一个用于遥感图像理解的统一多模态指令跟随数据集 spatial relationship large language model foundation model

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
11 VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing VIA:用于全局和局部视频编辑的统一时空视频适配框架 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
12 Cycle-Correspondence Loss: Learning Dense View-Invariant Visual Features from Unlabeled and Unordered RGB Images 提出循环对应损失,用于从无标签RGB图像中学习视角不变的视觉特征 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页