cs.CV(2024-06-14)

📊 共 37 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (14 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (10 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗4) 支柱六:视频提取与匹配 (Video Extraction) (3) 支柱四:生成式动作 (Generative Motion) (2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (14 篇)

#题目一句话要点标签🔗
1 First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models 提出FlowCE,用于多模态大语言模型在流程图理解上的多维度评估 large language model multimodal
2 GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding 评估多模态大语言模型在猪仔行为理解中的视觉感知能力,GPT-4o表现突出 large language model multimodal
3 Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings 提出工业语言-图像数据集(ILID),并探索视觉基础模型在工业场景的迁移学习。 large language model foundation model multimodal
4 What is the Visual Cognition Gap between Humans and Multimodal LLMs? 提出MaRs-VQA数据集,评估多模态大语言模型在视觉认知推理方面的能力 large language model multimodal
5 BrainSegFounder: Towards 3D Foundation Models for Neuroimage Segmentation BrainSegFounder:面向神经影像分割的三维医学影像分割基础模型 foundation model
6 Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding 提出Pun Rebus Art Dataset,用于提升视觉-语言模型对中国文化语境下艺术的理解能力。 multimodal
7 SmartRSD: An Intelligent Multimodal Approach to Real-Time Road Surface Detection for Safe Driving SmartRSD:提出一种智能多模态方法,用于道路表面实时检测以提升驾驶安全。 multimodal
8 Localizing Events in Videos with Multimodal Queries 提出ICQ基准和多模态查询适配方法,用于视频事件定位任务 multimodal
9 ProtoS-ViT: Visual foundation models for sparse self-explainable classifications 提出ProtoS-ViT以解决稀疏自解释分类问题 foundation model
10 SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions SemanticSpray++:提出用于湿滑路面自动驾驶的多模态数据集 multimodal
11 Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation 结合视觉基础模型与无监督域自适应提升语义分割性能与效率 foundation model
12 AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming AnimalFormer:用于行为分析的精准畜牧多模态视觉框架 multimodal
13 MoME: Mixture of Multimodal Experts for Cancer Survival Prediction 提出MoME模型,通过多模态专家混合解决癌症生存预测中异构数据融合问题 multimodal
14 Detecting and Evaluating Medical Hallucinations in Large Vision Language Models 提出Med-HallMark医学幻觉检测基准与MediHall Score评估指标,并构建MediHallDetector模型。 large language model multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
15 PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting PUP 3D-GS:基于不确定性剪枝的3D高斯溅射,提升压缩率并保持视觉质量。 3D gaussian splatting gaussian splatting splatting
16 Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections Wild-GS:针对非结构化照片集,实现高效实时的新视角合成 3D gaussian splatting 3DGS gaussian splatting
17 Open-Vocabulary Semantic Segmentation with Image Embedding Balancing EBSeg:通过图像嵌入平衡实现开放词汇语义分割 open-vocabulary open vocabulary
18 Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion 提出基于分层特征引导扩散的无监督单目深度估计方法,提升模型在模糊和噪声环境下的鲁棒性。 depth estimation monocular depth
19 The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences 发布BabyView数据集:高分辨率婴儿第一视角日常视频,助力类人AI研究 depth estimation egocentric
20 L4GM: Large 4D Gaussian Reconstruction Model L4GM:首个大型4D高斯重建模型,从单视角视频生成动画物体。 3D gaussian splatting gaussian splatting splatting
21 D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video 提出动态神经点云D-NPC,用于单目视频非刚性场景的新视角合成。 depth estimation monocular depth spatiotemporal
22 NeST: Neural Stress Tensor Tomography by leveraging 3D Photoelasticity NeST:利用3D光弹性的神经应力张量层析成像 implicit representation
23 1-Lipschitz Neural Distance Fields 提出基于1-Lipschitz神经网络的距离场方法,提升几何查询鲁棒性,适用于低质量几何数据。 implicit representation
24 RaNeuS: Ray-adaptive Neural Surface Reconstruction RaNeuS:提出射线自适应神经表面重建方法,提升NeRF在细节几何重建上的性能。 NeRF

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
25 GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion GradeADreamer:利用高斯溅射和多视角扩散增强文本到3D生成效果 dreamer gaussian splatting splatting
26 GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors 提出GaussianSR,利用2D扩散先验实现低分辨率图像到高分辨率3D高斯模型的超分辨率重建。 distillation 3D gaussian splatting 3DGS
27 Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection 提出基于图像预训练模型的货架监督跨模态预训练方法,提升3D目标检测在有限数据下的性能。 contrastive learning foundation model multimodal
28 A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion 提出基于双阶段掩码自编码器的深度补全网络,提升复杂室内场景深度补全效果 masked autoencoder scene understanding
29 InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning InstructRL4Pix:提出基于强化学习的扩散模型图像编辑方法 reinforcement learning PPO
30 Fine-Grained Urban Flow Inference with Multi-scale Representation Learning UrbanMSR:提出一种基于多尺度表示学习的精细化城市流量推断模型 representation learning contrastive learning
31 Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses 提出一种神经姿态表示学习方法,用于生成和迁移非刚性物体姿态。 representation learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)

#题目一句话要点标签🔗
32 EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models 提出EFM3D基准测试,用于评估3D第一人称视角基础模型进展 egocentric foundation model
33 PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos PARSE-Ego4D:为第一视角视频提供个性化行为推荐 egocentric Ego4D large language model
34 MeshPose: Unifying DensePose and 3D Body Mesh reconstruction MeshPose:统一DensePose与3D人体网格重建,实现高精度实时人体姿态估计 HMR

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
35 Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild Nymeria:大规模多模态第一人称日常运动数据集,助力人体运动理解 motion synthesis egocentric human motion
36 MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers MeshAnything:利用自回归Transformer生成艺术家级别网格模型 VQ-VAE

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
37 Automated GIS-Based Framework for Detecting Crosswalk Changes from Bi-Temporal High-Resolution Aerial Images 提出基于GIS的自动化框架,利用时序高分辨率航拍图像检测人行横道变化 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页