cs.CV（2024-05-14）

📊 共 20 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (7 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (4 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (4) 支柱一：机器人控制 (Robot Control) (3) 支柱八：物理动画 (Physics-based Animation) (2)

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Open-Vocabulary Object Detection via Neighboring Region Attention Alignment	提出NRAA，通过邻域区域注意力对齐提升开放词汇目标检测性能	distillation open-vocabulary open vocabulary
2	Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring	提出一种基于多模态大语言模型的PI-RADS评分方法，融入临床指南提升前列腺癌诊断准确性	distillation large language model	✅
3	WaterMamba: Visual State Space Model for Underwater Image Enhancement	提出WaterMamba，一种用于水下图像增强的视觉状态空间模型，兼顾性能与效率。	Mamba SSM state space model
4	Vector-Symbolic Architecture for Event-Based Optical Flow	提出基于向量符号架构的高维特征描述子，用于事件相机的光流估计	flow matching optical flow feature matching
5	EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training	EfficientTrain++：通过广义课程学习高效训练视觉骨干网络	MAE curriculum learning
6	CLIP with Quality Captions: A Strong Pretraining for Vision Tasks	通过高质量Caption提升CLIP视觉表征，显著改善下游视觉任务	masked autoencoder MAE depth estimation
7	Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph	提出基于图知识的多层特征蒸馏框架，提升小模型性能	distillation

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
8	EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera	EndoDAC：高效自监督内窥镜深度估计，适配任意相机	depth estimation foundation model	✅
9	Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events	提出MuDet多模态协作网络，解决大规模灾害事件中密集遮挡车辆的检测问题。	height map multimodal	✅
10	Dynamic NeRF: A Review	综述动态NeRF：回顾2021-2023年动态场景三维重建与表示的关键技术与发展。	NeRF neural radiance field
11	RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images	提出基于残差的密集点网络RDPN6D，用于RGB-D图像的6DoF物体姿态估计	6D pose estimation	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
12	SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation	SciFIBench：用于科学图表理解的大型多模态模型评测基准	multimodal
13	Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding	Hunyuan-DiT：一种强大的多分辨率扩散Transformer，具备精细的中文理解能力	large language model multimodal
14	VS-Assistant: Versatile Surgery Assistant on the Demand of Surgeons	提出VS-Assistant，利用多模态大语言模型实现多功能外科手术辅助	large language model multimodal
15	Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video	提出一种基于语言引导的自监督视频摘要方法，利用文本语义匹配和视频多样性优化。	large language model

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
16	MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models	MetaFruit：构建农业领域基础模型，提升机器人采摘水果的泛化性与精度	manipulation foundation model
17	Learning Correspondence for Deformable Objects	针对可变形物体，提出基于学习的像素级对应关系方法，提升机器人操作性能。	manipulation feature matching
18	Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method	提出语义上下文人脸伪造定义与检测方法，并构建大规模数据集	manipulation

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Filtering After Shading With Stochastic Texture Filtering	提出基于随机纹理滤波的后着色滤波方法，提升渲染图像质量。	spatiotemporal
20	Enhancing Blind Video Quality Assessment with Rich Quality-aware Features	提出RQ-VQA，利用多源质量感知特征增强盲视频质量评估，提升泛化性	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页