cs.CV(2024-09-13)

📊 共 16 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting 提出Crowd-Sourced Splatting以解决众包图像重建中的姿态与场景挑战 3D gaussian splatting 3DGS gaussian splatting
2 PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage PrimeDepth:利用稳定扩散预图像实现高效单目深度估计 depth estimation monocular depth Depth Anything
3 AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius AdR-Gaussian:通过自适应半径加速高斯溅射渲染,提升渲染效率。 3D gaussian splatting 3DGS gaussian splatting
4 Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints Dust-GS:针对稀疏视角场景重建,提出新型点云初始化方法 3D gaussian splatting 3DGS gaussian splatting
5 Anytime Continual Learning for Open Vocabulary Classification 提出AnytimeCL方法,用于开放词汇图像分类的即时持续学习 open-vocabulary open vocabulary
6 Generalization Boosted Adapter for Open-Vocabulary Segmentation 提出GBA,增强视觉-语言模型在开放词汇分割任务中的泛化能力 open-vocabulary open vocabulary
7 Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding 提出结合计算机视觉与物联网的罗非鱼精准喂养系统,提升养殖效率 depth estimation
8 Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry 提出基于因果Transformer的视觉惯性融合方法VIFT,提升单目视觉惯性里程计的位姿估计精度。 VIO

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
9 Uncertainty and Generalizability in Foundation Models for Earth Observation 针对地球观测,研究了基础模型的不确定性和泛化性,并提出评估方法。 foundation model
10 ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning 提出ChangeChat,首个遥感变化分析交互式多模态指令调优模型 multimodal
11 Towards Unified Facial Action Unit Recognition Framework by Large Language Models 提出基于大语言模型的统一面部动作单元识别框架AU-LLaVA large language model
12 VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation 提出VLTP,利用视觉-语言引导的token剪枝加速面向任务的ViT分割模型。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
13 Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Mamba-YOLO-World:融合Mamba的YOLO-World,用于开放词汇目标检测 Mamba state space model open-vocabulary
14 Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing 提出交互式掩码图像建模方法,提升遥感多模态目标检测精度 MAE multimodal
15 Joint image reconstruction and segmentation of real-time cardiac MRI in free-breathing using a model based on disentangled representation learning 提出基于解耦表示学习的联合重建与分割模型,用于自由呼吸下实时心脏MRI representation learning PULSE

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
16 Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori 提出基于小波先验的4D LUT低照度视频增强方法,提升时空颜色一致性。 spatiotemporal multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页