cs.CV（2024-07-12）

📊 共 22 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (10 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱六：视频提取与匹配 (Video Extraction) (3) 支柱四：生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection	提出GLIS框架，利用全局-局部协作推理和LLM提升LiDAR开放词汇检测性能。	open-vocabulary open vocabulary large language model	✅
2	StyleSplat: 3D Object Style Transfer with Gaussian Splatting	StyleSplat：基于高斯溅射的3D物体风格迁移方法	3D gaussian splatting gaussian splatting splatting
3	DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training	DART：自动化端到端目标检测流水线，解决标注难题并提升检测精度。	open-vocabulary open vocabulary multimodal	✅
4	Open Vocabulary Multi-Label Video Classification	提出基于LLM语义引导的开放词汇多标签视频分类方法，提升视频理解能力。	open-vocabulary open vocabulary large language model
5	ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion	ProDepth：利用概率融合提升自监督多帧单目深度估计	depth estimation monocular depth feature matching
6	KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting	KGpose：基于关键点图和逐点姿态投票的多目标6D姿态端到端估计	6D pose estimation
7	Physics-Informed Learning of Characteristic Trajectories for Smoke Reconstruction	提出神经特征轨迹场，用于烟雾重建中长期物理约束建模	NeRF scene reconstruction	✅
8	Radiance Fields from Photons	提出基于单光子相机（SPC）的Quanta NeRF，解决低光、高动态范围和高速运动下的NeRF重建问题。	NeRF neural radiance field
9	HPC: Hierarchical Progressive Coding Framework for Volumetric Video	提出HPC框架，以单模型实现神经辐射场体积视频的灵活可变码率压缩。	NeRF neural radiance field
10	Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems	提出基于隐式表达的电磁逆散射方法，用于非侵入式内部成像	implicit representation	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba	提出Hamba，利用图引导双向扫描Mamba进行单视角3D手部重建	Mamba state space model hand reconstruction	✅
12	Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT	提出组织对比半掩蔽自编码器TCS-MAE，用于胸部CT图像分割预训练。	masked autoencoder MAE contrastive learning
13	SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification	SlideGCD：基于Slide间图协同训练与知识蒸馏的WSI分类方法	representation learning distillation	✅
14	CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning	提出CLOVER，解决移动机器人视角和环境不变的长期物体重识别问题	representation learning
15	Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization	提出FuSTAL框架，通过多阶段伪标签质量增强提升弱监督时序动作定位性能	contrastive learning distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Diagnosing and Re-learning for Balanced Multimodal Learning	提出Diagnosing & Re-learning方法，解决多模态学习中的模态不平衡问题。	multimodal	✅
17	3x2: 3D Object Part Segmentation by 2D Semantic Correspondences	提出3-By-2方法以解决3D物体部件分割问题	foundation model	✅
18	WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation	WSESeg：冬季运动器材分割数据集及交互式分割基线方法	foundation model

🔬 支柱六：视频提取与匹配 (Video Extraction) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
19	HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation	HUP-3D：用于辅助式手持超声姿态估计的三维多视角合成数据集	egocentric
20	Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images	提出Divide and Fuse方法，解决部分可见人体图像的3D网格重建问题	SMPL
21	Predicting Winning Captions for Weekly New Yorker Comics	提出基于Vision Transformer的图像描述模型，用于生成《纽约客》漫画的幽默标题。	HuMoR

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	PID: Physics-Informed Diffusion Model for Infrared Image Generation	提出物理信息扩散模型PID，用于生成符合物理规律的红外图像	physics-informed diffusion	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页