cs.CV(2024-09-06)

📊 共 21 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers GST:利用高斯溅射Transformer从单张图像精确重建3D人体模型 3D gaussian splatting 3DGS gaussian splatting
2 Introducing a Class-Aware Metric for Monocular Depth Estimation: An Automotive Perspective 提出一种面向汽车场景的、类别感知的单目深度估计评估指标,提升安全性和可靠性。 depth estimation monocular depth
3 SDformerFlow: Spatiotemporal swin spikeformer for event-based optical flow estimation 提出基于时空Swin Spikeformer的SDformerFlow,用于事件相机光流估计。 optical flow spatiotemporal
4 NeCA: 3D Coronary Artery Tree Reconstruction from Two 2D Projections via Neural Implicit Representation 提出NeCA,通过神经隐式表示从两张2D图像重建3D冠状动脉树 implicit representation
5 Hybrid Cost Volume for Memory-Efficient Optical Flow 提出混合代价体HCVFlow,解决高分辨率图像光流计算中内存消耗过大的问题。 optical flow
6 3D-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors 提出3D-LMVIC,利用3D高斯先验提升多视角图像编码性能,适用于VR和自动驾驶。 3D gaussian splatting gaussian splatting splatting
7 RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement 提出RCNet:一种用于多视角低光图像增强的深度循环协同网络 scene understanding
8 Towards Energy-Efficiency by Navigating the Trilemma of Energy, Latency, and Accuracy 面向XR设备,通过协同优化能量、延迟和精度三难困境实现能效提升。 scene reconstruction

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
9 Influence of Early through Late Fusion on Pancreas Segmentation from Imperfectly Registered Multimodal MRI 研究多模态MRI胰腺分割中,早、中、晚期融合对不完美配准图像的影响 multimodal
10 VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation VILA-U:统一视觉理解与生成的自回归基础模型 foundation model
11 Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques 探索基于Foundation Model的合成医学影像:胸部X光片生成与微调技术研究 foundation model
12 Generating Faithful and Salient Text from Multimodal Data 提出基于视觉评论模型的框架,提升多模态数据生成文本的真实性和显著性。 multimodal
13 UniDet3D: Multi-dataset Indoor 3D Object Detection UniDet3D:提出一种多数据集联合训练的室内3D目标检测框架。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
14 SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance Fields 提出SCARF框架,以低存储成本实现多场景NeRF的增量学习和高质量渲染。 distillation NeRF neural radiance field
15 Serp-Mamba: Advancing High-Resolution Retinal Vessel Segmentation with Selective State-Space Model 提出Serp-Mamba网络,用于提升高分辨率视网膜血管分割精度 Mamba SSM state space model
16 Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment 提出基于可微局部对齐的自监督对比学习视频表征方法,提升动作识别性能。 representation learning contrastive learning
17 Dual-Level Cross-Modal Contrastive Clustering 提出双层跨模态对比聚类框架DXMC,提升图像聚类语义理解能力 representation learning contrastive learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
18 Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics HOGraspNet:一个包含完整抓取分类和动力学的密集手-物交互数据集 MANO foundation model
19 HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR HiSC4D:利用可穿戴IMU和激光雷达进行大规模场景中以人为中心的交互和4D场景捕获 SMPL egocentric

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
20 Cycle Pixel Difference Network for Crisp Edge Detection 提出CPD-Net,通过循环像素差卷积和多尺度信息增强实现清晰边缘检测 biped

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
21 MultiCounter: Multiple Action Agnostic Repetition Counting in Untrimmed Videos 提出MultiCounter,用于在无分割视频中进行与动作无关的多目标重复计数。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页