cs.CV（2024-09-13）

📊 共 16 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (3) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting	提出Crowd-Sourced Splatting以解决众包图像重建中的姿态与场景挑战	3D gaussian splatting 3DGS gaussian splatting
2	PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage	PrimeDepth：利用稳定扩散预图像实现高效单目深度估计	depth estimation monocular depth Depth Anything
3	AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius	AdR-Gaussian：通过自适应半径加速高斯溅射渲染，提升渲染效率。	3D gaussian splatting 3DGS gaussian splatting
4	Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints	Dust-GS：针对稀疏视角场景重建，提出新型点云初始化方法	3D gaussian splatting 3DGS gaussian splatting
5	Anytime Continual Learning for Open Vocabulary Classification	提出AnytimeCL方法，用于开放词汇图像分类的即时持续学习	open-vocabulary open vocabulary	✅
6	Generalization Boosted Adapter for Open-Vocabulary Segmentation	提出GBA，增强视觉-语言模型在开放词汇分割任务中的泛化能力	open-vocabulary open vocabulary
7	Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding	提出结合计算机视觉与物联网的罗非鱼精准喂养系统，提升养殖效率	depth estimation
8	Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry	提出基于因果Transformer的视觉惯性融合方法VIFT，提升单目视觉惯性里程计的位姿估计精度。	VIO	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Uncertainty and Generalizability in Foundation Models for Earth Observation	针对地球观测，研究了基础模型的不确定性和泛化性，并提出评估方法。	foundation model
10	ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning	提出ChangeChat，首个遥感变化分析交互式多模态指令调优模型	multimodal	✅
11	Towards Unified Facial Action Unit Recognition Framework by Large Language Models	提出基于大语言模型的统一面部动作单元识别框架AU-LLaVA	large language model
12	VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation	提出VLTP，利用视觉-语言引导的token剪枝加速面向任务的ViT分割模型。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection	Mamba-YOLO-World：融合Mamba的YOLO-World，用于开放词汇目标检测	Mamba state space model open-vocabulary
14	Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing	提出交互式掩码图像建模方法，提升遥感多模态目标检测精度	MAE multimodal
15	Joint image reconstruction and segmentation of real-time cardiac MRI in free-breathing using a model based on disentangled representation learning	提出基于解耦表示学习的联合重建与分割模型，用于自由呼吸下实时心脏MRI	representation learning PULSE

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori	提出基于小波先验的4D LUT低照度视频增强方法，提升时空颜色一致性。	spatiotemporal multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页