cs.CV(2024-07-03)

📊 共 24 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (9 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
1 A Unified Framework for 3D Scene Understanding UniSeg3D:提出统一的3D场景理解框架,实现多任务分割并超越SOTA方法。 contrastive learning distillation scene understanding
2 ACTRESS: Active Retraining for Semi-supervised Visual Grounding ACTRESS:面向半监督视觉定位的主动重训练方法 teacher-student visual grounding
3 FlowCon: Out-of-Distribution Detection using Flow-Based Contrastive Learning FlowCon:结合流模型与对比学习的分布外数据检测方法 representation learning contrastive learning
4 BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement 提出BVI-RLV数据集,用于低光视频增强的训练和基准测试 Mamba state space model spatiotemporal
5 Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion LSMap:利用视觉基础模型进行无标签语义场景补全,提升城市场景感知能力。 representation learning foundation model
6 Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-View 3D Detection and Tracking 提出循环精炼器,用于多视角3D检测与跟踪中的目标感知时序表征学习。 representation learning
7 Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation 提出基于无监督知识蒸馏的提示学习方法,提升视觉-语言模型零样本泛化能力 distillation
8 Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation 提出基于增强FCOS-Lite和知识蒸馏的边缘AI鸡群健康检测方案 distillation
9 Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization 提出基于知识蒸馏和量化的边缘设备统一异常检测方法 distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
10 Domain-Aware Fine-Tuning of Foundation Models 提出Domino:一种领域自适应归一化方法,提升基础模型在领域迁移下的性能。 foundation model
11 A Survey on Trustworthiness in Foundation Models for Medical Image Analysis 提出信任性框架以解决医学图像分析中的信任问题 foundation model
12 Visual Grounding with Attention-Driven Constraint Balancing 提出Attention-Driven Constraint Balancing方法,优化视觉Grounding任务中的特征学习。 visual grounding
13 SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding SegVG:通过将目标框转换为分割信息,提升视觉定位性能 visual grounding
14 3D Multimodal Image Registration for Plant Phenotyping 提出一种基于深度信息的3D多模态图像配准方法,用于植物表型分析。 multimodal
15 MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis MindBench:用于思维导图结构识别与分析的综合性基准测试 large language model multimodal
16 Multi-Task Domain Adaptation for Language Grounding with 3D Objects 提出DA4LG,通过多任务领域自适应实现3D对象语言定位的跨域泛化。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
17 Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction 提出Free-SurGS,一种无需SfM的手术场景3D高斯溅射重建方法 3D gaussian splatting 3DGS gaussian splatting
18 VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors VEGS:利用学习先验的三维高斯溅射实现城市场景的视角外推 3D gaussian splatting gaussian splatting splatting
19 EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support EgoFlowNet:一种支持自运动的点云非刚性场景流估计网络 scene flow
20 BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs 提出BACON方法,通过概念图提升图像描述的清晰度,增强下游任务性能。 open-vocabulary open vocabulary
21 Stereo Risk: A Continuous Modeling Approach to Stereo Matching 提出Stereo Risk,通过连续风险建模提升立体匹配精度。 scene flow

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
22 DyFADet: Dynamic Feature Aggregation for Temporal Action Detection 提出DyFADet,通过动态特征聚合解决时序动作检测中长短动作实例的检测难题。 Ego4D TAMP
23 Expressive Gaussian Human Avatars from Monocular RGB Video 提出EVA,通过单目RGB视频学习具有精细手部和面部表情的高斯人像模型 SMPL SMPL-X

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
24 Recompression Based JPEG Tamper Detection and Localization Using Deep Neural Network Eliminating Compression Factor Dependency 提出一种基于重压缩的JPEG图像篡改检测与定位方法,消除压缩因子依赖性。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页