cs.CV(2024-05-03)
📊 共 14 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (6)
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2 | 提出HoloGS以解决即时3D高斯点云重建问题 | 3D gaussian splatting gaussian splatting splatting | ||
| 2 | Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models | UPPM:利用动态标签和基础模型实现统一的可Prompt全景地图构建 | scene understanding open-vocabulary open vocabulary | ||
| 3 | M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation | M${^2}$Depth:面向自动驾驶,提出自监督双帧多相机度量深度估计方法 | depth estimation metric depth | ||
| 4 | Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids | Rip-NeRF:利用Ripmap编码的柏拉图固体实现抗锯齿神经辐射场渲染 | NeRF neural radiance field | ||
| 5 | WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights | 提出WateRF:一种针对NeRF的鲁棒水印方法,用于保护版权 | NeRF neural radiance field | ||
| 6 | Rasterized Edge Gradients: Handling Discontinuities Differentiably | 提出基于光栅化的可微渲染梯度计算方法,有效处理不连续性问题 | scene reconstruction |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model | 提出SSD-LLM框架,利用大语言模型发现并分析数据集中的子群体结构。 | large language model instruction following | ||
| 8 | Auto-Encoding Morph-Tokens for Multimodal LLM | 提出Auto-Encoding Morph-Tokens,解决多模态LLM视觉理解与生成间的冲突。 | multimodal | ✅ | |
| 9 | What matters when building vision-language models? | 构建视觉-语言模型的关键要素分析与高效模型Idefics2的提出 | large language model multimodal | ||
| 10 | Improving Concept Alignment in Vision-Language Concept Bottleneck Models | 提出对比半监督学习方法,提升视觉-语言概念瓶颈模型中概念对齐 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space | 提出FER-YOLO-Mamba模型,用于高效的面部表情检测与分类。 | Mamba SSM state space model | ✅ | |
| 12 | SatSwinMAE: Efficient Autoencoding for Multiscale Time-series Satellite Imagery | SatSwinMAE:用于多尺度时间序列卫星图像的高效自编码模型 | masked autoencoder MAE foundation model | ||
| 13 | Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning | 提出上下文感知视觉-文本对比学习方法,提升微手势识别的情感理解能力 | contrastive learning | ||
| 14 | DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos | DreamScene4D:提出一种从单目视频生成动态多对象3D场景的方法。 | predictive model distillation |