cs.CV（2024-09-06）

📊 共 21 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (8 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4) 支柱六：视频提取与匹配 (Video Extraction) (2 🔗1) 支柱一：机器人控制 (Robot Control) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers	GST：利用高斯溅射Transformer从单张图像精确重建3D人体模型	3D gaussian splatting 3DGS gaussian splatting	✅
2	Introducing a Class-Aware Metric for Monocular Depth Estimation: An Automotive Perspective	提出一种面向汽车场景的、类别感知的单目深度估计评估指标，提升安全性和可靠性。	depth estimation monocular depth	✅
3	SDformerFlow: Spatiotemporal swin spikeformer for event-based optical flow estimation	提出基于时空Swin Spikeformer的SDformerFlow，用于事件相机光流估计。	optical flow spatiotemporal
4	NeCA: 3D Coronary Artery Tree Reconstruction from Two 2D Projections via Neural Implicit Representation	提出NeCA，通过神经隐式表示从两张2D图像重建3D冠状动脉树	implicit representation
5	Hybrid Cost Volume for Memory-Efficient Optical Flow	提出混合代价体HCVFlow，解决高分辨率图像光流计算中内存消耗过大的问题。	optical flow	✅
6	3D-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors	提出3D-LMVIC，利用3D高斯先验提升多视角图像编码性能，适用于VR和自动驾驶。	3D gaussian splatting gaussian splatting splatting
7	RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement	提出RCNet：一种用于多视角低光图像增强的深度循环协同网络	scene understanding	✅
8	Towards Energy-Efficiency by Navigating the Trilemma of Energy, Latency, and Accuracy	面向XR设备，通过协同优化能量、延迟和精度三难困境实现能效提升。	scene reconstruction

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Influence of Early through Late Fusion on Pancreas Segmentation from Imperfectly Registered Multimodal MRI	研究多模态MRI胰腺分割中，早、中、晚期融合对不完美配准图像的影响	multimodal
10	VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation	VILA-U：统一视觉理解与生成的自回归基础模型	foundation model
11	Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques	探索基于Foundation Model的合成医学影像：胸部X光片生成与微调技术研究	foundation model
12	Generating Faithful and Salient Text from Multimodal Data	提出基于视觉评论模型的框架，提升多模态数据生成文本的真实性和显著性。	multimodal
13	UniDet3D: Multi-dataset Indoor 3D Object Detection	UniDet3D：提出一种多数据集联合训练的室内3D目标检测框架。	foundation model	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
14	SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance Fields	提出SCARF框架，以低存储成本实现多场景NeRF的增量学习和高质量渲染。	distillation NeRF neural radiance field
15	Serp-Mamba: Advancing High-Resolution Retinal Vessel Segmentation with Selective State-Space Model	提出Serp-Mamba网络，用于提升高分辨率视网膜血管分割精度	Mamba SSM state space model
16	Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment	提出基于可微局部对齐的自监督对比学习视频表征方法，提升动作识别性能。	representation learning contrastive learning
17	Dual-Level Cross-Modal Contrastive Clustering	提出双层跨模态对比聚类框架DXMC，提升图像聚类语义理解能力	representation learning contrastive learning

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics	HOGraspNet：一个包含完整抓取分类和动力学的密集手-物交互数据集	MANO foundation model	✅
19	HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR	HiSC4D：利用可穿戴IMU和激光雷达进行大规模场景中以人为中心的交互和4D场景捕获	SMPL egocentric

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Cycle Pixel Difference Network for Crisp Edge Detection	提出CPD-Net，通过循环像素差卷积和多尺度信息增强实现清晰边缘检测	biped

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	MultiCounter: Multiple Action Agnostic Repetition Counting in Untrimmed Videos	提出MultiCounter，用于在无分割视频中进行与动作无关的多目标重复计数。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页