cs.CV(2024-05-03)

📊 共 14 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2 提出HoloGS以解决即时3D高斯点云重建问题 3D gaussian splatting gaussian splatting splatting
2 Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models UPPM:利用动态标签和基础模型实现统一的可Prompt全景地图构建 scene understanding open-vocabulary open vocabulary
3 M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation M${^2}$Depth:面向自动驾驶,提出自监督双帧多相机度量深度估计方法 depth estimation metric depth
4 Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids Rip-NeRF:利用Ripmap编码的柏拉图固体实现抗锯齿神经辐射场渲染 NeRF neural radiance field
5 WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights 提出WateRF:一种针对NeRF的鲁棒水印方法,用于保护版权 NeRF neural radiance field
6 Rasterized Edge Gradients: Handling Discontinuities Differentiably 提出基于光栅化的可微渲染梯度计算方法,有效处理不连续性问题 scene reconstruction

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
7 LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model 提出SSD-LLM框架,利用大语言模型发现并分析数据集中的子群体结构。 large language model instruction following
8 Auto-Encoding Morph-Tokens for Multimodal LLM 提出Auto-Encoding Morph-Tokens,解决多模态LLM视觉理解与生成间的冲突。 multimodal
9 What matters when building vision-language models? 构建视觉-语言模型的关键要素分析与高效模型Idefics2的提出 large language model multimodal
10 Improving Concept Alignment in Vision-Language Concept Bottleneck Models 提出对比半监督学习方法,提升视觉-语言概念瓶颈模型中概念对齐 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
11 FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space 提出FER-YOLO-Mamba模型,用于高效的面部表情检测与分类。 Mamba SSM state space model
12 SatSwinMAE: Efficient Autoencoding for Multiscale Time-series Satellite Imagery SatSwinMAE:用于多尺度时间序列卫星图像的高效自编码模型 masked autoencoder MAE foundation model
13 Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning 提出上下文感知视觉-文本对比学习方法,提升微手势识别的情感理解能力 contrastive learning
14 DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos DreamScene4D:提出一种从单目视频生成动态多对象3D场景的方法。 predictive model distillation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页