cs.CV(2024-07-12)

📊 共 22 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (10 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (3) 支柱四:生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
1 Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection 提出GLIS框架,利用全局-局部协作推理和LLM提升LiDAR开放词汇检测性能。 open-vocabulary open vocabulary large language model
2 StyleSplat: 3D Object Style Transfer with Gaussian Splatting StyleSplat:基于高斯溅射的3D物体风格迁移方法 3D gaussian splatting gaussian splatting splatting
3 DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training DART:自动化端到端目标检测流水线,解决标注难题并提升检测精度。 open-vocabulary open vocabulary multimodal
4 Open Vocabulary Multi-Label Video Classification 提出基于LLM语义引导的开放词汇多标签视频分类方法,提升视频理解能力。 open-vocabulary open vocabulary large language model
5 ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion ProDepth:利用概率融合提升自监督多帧单目深度估计 depth estimation monocular depth feature matching
6 KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting KGpose:基于关键点图和逐点姿态投票的多目标6D姿态端到端估计 6D pose estimation
7 Physics-Informed Learning of Characteristic Trajectories for Smoke Reconstruction 提出神经特征轨迹场,用于烟雾重建中长期物理约束建模 NeRF scene reconstruction
8 Radiance Fields from Photons 提出基于单光子相机(SPC)的Quanta NeRF,解决低光、高动态范围和高速运动下的NeRF重建问题。 NeRF neural radiance field
9 HPC: Hierarchical Progressive Coding Framework for Volumetric Video 提出HPC框架,以单模型实现神经辐射场体积视频的灵活可变码率压缩。 NeRF neural radiance field
10 Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems 提出基于隐式表达的电磁逆散射方法,用于非侵入式内部成像 implicit representation

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
11 Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba 提出Hamba,利用图引导双向扫描Mamba进行单视角3D手部重建 Mamba state space model hand reconstruction
12 Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT 提出组织对比半掩蔽自编码器TCS-MAE,用于胸部CT图像分割预训练。 masked autoencoder MAE contrastive learning
13 SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification SlideGCD:基于Slide间图协同训练与知识蒸馏的WSI分类方法 representation learning distillation
14 CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning 提出CLOVER,解决移动机器人视角和环境不变的长期物体重识别问题 representation learning
15 Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization 提出FuSTAL框架,通过多阶段伪标签质量增强提升弱监督时序动作定位性能 contrastive learning distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
16 Diagnosing and Re-learning for Balanced Multimodal Learning 提出Diagnosing & Re-learning方法,解决多模态学习中的模态不平衡问题。 multimodal
17 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences 提出3-By-2方法以解决3D物体部件分割问题 foundation model
18 WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation WSESeg:冬季运动器材分割数据集及交互式分割基线方法 foundation model

🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)

#题目一句话要点标签🔗
19 HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation HUP-3D:用于辅助式手持超声姿态估计的三维多视角合成数据集 egocentric
20 Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images 提出Divide and Fuse方法,解决部分可见人体图像的3D网格重建问题 SMPL
21 Predicting Winning Captions for Weekly New Yorker Comics 提出基于Vision Transformer的图像描述模型,用于生成《纽约客》漫画的幽默标题。 HuMoR

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
22 PID: Physics-Informed Diffusion Model for Infrared Image Generation 提出物理信息扩散模型PID,用于生成符合物理规律的红外图像 physics-informed diffusion

⬅️ 返回 cs.CV 首页 · 🏠 返回主页