cs.CV（2024-07-03）

📊 共 24 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (9 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱六：视频提取与匹配 (Video Extraction) (2 🔗1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Unified Framework for 3D Scene Understanding	UniSeg3D：提出统一的3D场景理解框架，实现多任务分割并超越SOTA方法。	contrastive learning distillation scene understanding	✅
2	ACTRESS: Active Retraining for Semi-supervised Visual Grounding	ACTRESS：面向半监督视觉定位的主动重训练方法	teacher-student visual grounding
3	FlowCon: Out-of-Distribution Detection using Flow-Based Contrastive Learning	FlowCon：结合流模型与对比学习的分布外数据检测方法	representation learning contrastive learning
4	BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement	提出BVI-RLV数据集，用于低光视频增强的训练和基准测试	Mamba state space model spatiotemporal
5	Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion	LSMap：利用视觉基础模型进行无标签语义场景补全，提升城市场景感知能力。	representation learning foundation model
6	Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-View 3D Detection and Tracking	提出循环精炼器，用于多视角3D检测与跟踪中的目标感知时序表征学习。	representation learning
7	Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation	提出基于无监督知识蒸馏的提示学习方法，提升视觉-语言模型零样本泛化能力	distillation	✅
8	Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation	提出基于增强FCOS-Lite和知识蒸馏的边缘AI鸡群健康检测方案	distillation
9	Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization	提出基于知识蒸馏和量化的边缘设备统一异常检测方法	distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
10	Domain-Aware Fine-Tuning of Foundation Models	提出Domino：一种领域自适应归一化方法，提升基础模型在领域迁移下的性能。	foundation model
11	A Survey on Trustworthiness in Foundation Models for Medical Image Analysis	提出信任性框架以解决医学图像分析中的信任问题	foundation model
12	Visual Grounding with Attention-Driven Constraint Balancing	提出Attention-Driven Constraint Balancing方法，优化视觉Grounding任务中的特征学习。	visual grounding
13	SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding	SegVG：通过将目标框转换为分割信息，提升视觉定位性能	visual grounding
14	3D Multimodal Image Registration for Plant Phenotyping	提出一种基于深度信息的3D多模态图像配准方法，用于植物表型分析。	multimodal
15	MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis	MindBench：用于思维导图结构识别与分析的综合性基准测试	large language model multimodal	✅
16	Multi-Task Domain Adaptation for Language Grounding with 3D Objects	提出DA4LG，通过多任务领域自适应实现3D对象语言定位的跨域泛化。	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction	提出Free-SurGS，一种无需SfM的手术场景3D高斯溅射重建方法	3D gaussian splatting 3DGS gaussian splatting	✅
18	VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors	VEGS：利用学习先验的三维高斯溅射实现城市场景的视角外推	3D gaussian splatting gaussian splatting splatting	✅
19	EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support	EgoFlowNet：一种支持自运动的点云非刚性场景流估计网络	scene flow
20	BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs	提出BACON方法，通过概念图提升图像描述的清晰度，增强下游任务性能。	open-vocabulary open vocabulary
21	Stereo Risk: A Continuous Modeling Approach to Stereo Matching	提出Stereo Risk，通过连续风险建模提升立体匹配精度。	scene flow

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
22	DyFADet: Dynamic Feature Aggregation for Temporal Action Detection	提出DyFADet，通过动态特征聚合解决时序动作检测中长短动作实例的检测难题。	Ego4D TAMP	✅
23	Expressive Gaussian Human Avatars from Monocular RGB Video	提出EVA，通过单目RGB视频学习具有精细手部和面部表情的高斯人像模型	SMPL SMPL-X

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	Recompression Based JPEG Tamper Detection and Localization Using Deep Neural Network Eliminating Compression Factor Dependency	提出一种基于重压缩的JPEG图像篡改检测与定位方法，消除压缩因子依赖性。	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页