cs.CV（2024-06-14）

📊 共 37 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (14 🔗4) 支柱三：空间感知与语义 (Perception & Semantics) (10 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (7 🔗4) 支柱六：视频提取与匹配 (Video Extraction) (3) 支柱四：生成式动作 (Generative Motion) (2) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models	提出FlowCE，用于多模态大语言模型在流程图理解上的多维度评估	large language model multimodal	✅
2	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	评估多模态大语言模型在猪仔行为理解中的视觉感知能力，GPT-4o表现突出	large language model multimodal
3	Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings	提出工业语言-图像数据集(ILID)，并探索视觉基础模型在工业场景的迁移学习。	large language model foundation model multimodal
4	What is the Visual Cognition Gap between Humans and Multimodal LLMs?	提出MaRs-VQA数据集，评估多模态大语言模型在视觉认知推理方面的能力	large language model multimodal
5	BrainSegFounder: Towards 3D Foundation Models for Neuroimage Segmentation	BrainSegFounder：面向神经影像分割的三维医学影像分割基础模型	foundation model	✅
6	Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding	提出Pun Rebus Art Dataset，用于提升视觉-语言模型对中国文化语境下艺术的理解能力。	multimodal
7	SmartRSD: An Intelligent Multimodal Approach to Real-Time Road Surface Detection for Safe Driving	SmartRSD：提出一种智能多模态方法，用于道路表面实时检测以提升驾驶安全。	multimodal
8	Localizing Events in Videos with Multimodal Queries	提出ICQ基准和多模态查询适配方法，用于视频事件定位任务	multimodal
9	ProtoS-ViT: Visual foundation models for sparse self-explainable classifications	提出ProtoS-ViT以解决稀疏自解释分类问题	foundation model	✅
10	SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions	SemanticSpray++：提出用于湿滑路面自动驾驶的多模态数据集	multimodal
11	Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation	结合视觉基础模型与无监督域自适应提升语义分割性能与效率	foundation model
12	AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming	AnimalFormer：用于行为分析的精准畜牧多模态视觉框架	multimodal
13	MoME: Mixture of Multimodal Experts for Cancer Survival Prediction	提出MoME模型，通过多模态专家混合解决癌症生存预测中异构数据融合问题	multimodal	✅
14	Detecting and Evaluating Medical Hallucinations in Large Vision Language Models	提出Med-HallMark医学幻觉检测基准与MediHall Score评估指标，并构建MediHallDetector模型。	large language model multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
15	PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting	PUP 3D-GS：基于不确定性剪枝的3D高斯溅射，提升压缩率并保持视觉质量。	3D gaussian splatting gaussian splatting splatting
16	Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections	Wild-GS：针对非结构化照片集，实现高效实时的新视角合成	3D gaussian splatting 3DGS gaussian splatting
17	Open-Vocabulary Semantic Segmentation with Image Embedding Balancing	EBSeg：通过图像嵌入平衡实现开放词汇语义分割	open-vocabulary open vocabulary	✅
18	Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion	提出基于分层特征引导扩散的无监督单目深度估计方法，提升模型在模糊和噪声环境下的鲁棒性。	depth estimation monocular depth
19	The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences	发布BabyView数据集：高分辨率婴儿第一视角日常视频，助力类人AI研究	depth estimation egocentric
20	L4GM: Large 4D Gaussian Reconstruction Model	L4GM：首个大型4D高斯重建模型，从单视角视频生成动画物体。	3D gaussian splatting gaussian splatting splatting
21	D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video	提出动态神经点云D-NPC，用于单目视频非刚性场景的新视角合成。	depth estimation monocular depth spatiotemporal	✅
22	NeST: Neural Stress Tensor Tomography by leveraging 3D Photoelasticity	NeST：利用3D光弹性的神经应力张量层析成像	implicit representation
23	1-Lipschitz Neural Distance Fields	提出基于1-Lipschitz神经网络的距离场方法，提升几何查询鲁棒性，适用于低质量几何数据。	implicit representation
24	RaNeuS: Ray-adaptive Neural Surface Reconstruction	RaNeuS：提出射线自适应神经表面重建方法，提升NeRF在细节几何重建上的性能。	NeRF

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
25	GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion	GradeADreamer：利用高斯溅射和多视角扩散增强文本到3D生成效果	dreamer gaussian splatting splatting	✅
26	GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors	提出GaussianSR，利用2D扩散先验实现低分辨率图像到高分辨率3D高斯模型的超分辨率重建。	distillation 3D gaussian splatting 3DGS	✅
27	Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection	提出基于图像预训练模型的货架监督跨模态预训练方法，提升3D目标检测在有限数据下的性能。	contrastive learning foundation model multimodal	✅
28	A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion	提出基于双阶段掩码自编码器的深度补全网络，提升复杂室内场景深度补全效果	masked autoencoder scene understanding	✅
29	InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning	InstructRL4Pix：提出基于强化学习的扩散模型图像编辑方法	reinforcement learning PPO
30	Fine-Grained Urban Flow Inference with Multi-scale Representation Learning	UrbanMSR：提出一种基于多尺度表示学习的精细化城市流量推断模型	representation learning contrastive learning
31	Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses	提出一种神经姿态表示学习方法，用于生成和迁移非刚性物体姿态。	representation learning

🔬 支柱六：视频提取与匹配 (Video Extraction) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
32	EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models	提出EFM3D基准测试，用于评估3D第一人称视角基础模型进展	egocentric foundation model
33	PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos	PARSE-Ego4D：为第一视角视频提供个性化行为推荐	egocentric Ego4D large language model
34	MeshPose: Unifying DensePose and 3D Body Mesh reconstruction	MeshPose：统一DensePose与3D人体网格重建，实现高精度实时人体姿态估计	HMR

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
35	Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild	Nymeria：大规模多模态第一人称日常运动数据集，助力人体运动理解	motion synthesis egocentric human motion
36	MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers	MeshAnything：利用自回归Transformer生成艺术家级别网格模型	VQ-VAE

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
37	Automated GIS-Based Framework for Detecting Crosswalk Changes from Bi-Temporal High-Resolution Aerial Images	提出基于GIS的自动化框架，利用时序高分辨率航拍图像检测人行横道变化	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页