cs.CV(2024-05-14)

📊 共 20 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (7 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (4) 支柱一:机器人控制 (Robot Control) (3) 支柱八:物理动画 (Physics-based Animation) (2)

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
1 Open-Vocabulary Object Detection via Neighboring Region Attention Alignment 提出NRAA,通过邻域区域注意力对齐提升开放词汇目标检测性能 distillation open-vocabulary open vocabulary
2 Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring 提出一种基于多模态大语言模型的PI-RADS评分方法,融入临床指南提升前列腺癌诊断准确性 distillation large language model
3 WaterMamba: Visual State Space Model for Underwater Image Enhancement 提出WaterMamba,一种用于水下图像增强的视觉状态空间模型,兼顾性能与效率。 Mamba SSM state space model
4 Vector-Symbolic Architecture for Event-Based Optical Flow 提出基于向量符号架构的高维特征描述子,用于事件相机的光流估计 flow matching optical flow feature matching
5 EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training EfficientTrain++:通过广义课程学习高效训练视觉骨干网络 MAE curriculum learning
6 CLIP with Quality Captions: A Strong Pretraining for Vision Tasks 通过高质量Caption提升CLIP视觉表征,显著改善下游视觉任务 masked autoencoder MAE depth estimation
7 Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph 提出基于图知识的多层特征蒸馏框架,提升小模型性能 distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
8 EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera EndoDAC:高效自监督内窥镜深度估计,适配任意相机 depth estimation foundation model
9 Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events 提出MuDet多模态协作网络,解决大规模灾害事件中密集遮挡车辆的检测问题。 height map multimodal
10 Dynamic NeRF: A Review 综述动态NeRF:回顾2021-2023年动态场景三维重建与表示的关键技术与发展。 NeRF neural radiance field
11 RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images 提出基于残差的密集点网络RDPN6D,用于RGB-D图像的6DoF物体姿态估计 6D pose estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
12 SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation SciFIBench:用于科学图表理解的大型多模态模型评测基准 multimodal
13 Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Hunyuan-DiT:一种强大的多分辨率扩散Transformer,具备精细的中文理解能力 large language model multimodal
14 VS-Assistant: Versatile Surgery Assistant on the Demand of Surgeons 提出VS-Assistant,利用多模态大语言模型实现多功能外科手术辅助 large language model multimodal
15 Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video 提出一种基于语言引导的自监督视频摘要方法,利用文本语义匹配和视频多样性优化。 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
16 MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models MetaFruit:构建农业领域基础模型,提升机器人采摘水果的泛化性与精度 manipulation foundation model
17 Learning Correspondence for Deformable Objects 针对可变形物体,提出基于学习的像素级对应关系方法,提升机器人操作性能。 manipulation feature matching
18 Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method 提出语义上下文人脸伪造定义与检测方法,并构建大规模数据集 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
19 Filtering After Shading With Stochastic Texture Filtering 提出基于随机纹理滤波的后着色滤波方法,提升渲染图像质量。 spatiotemporal
20 Enhancing Blind Video Quality Assessment with Rich Quality-aware Features 提出RQ-VQA,利用多源质量感知特征增强盲视频质量评估,提升泛化性 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页