cs.CV(2024-05-14)
📊 共 20 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (7 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (4)
支柱一:机器人控制 (Robot Control) (3)
支柱八:物理动画 (Physics-based Animation) (2)
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera | EndoDAC:高效自监督内窥镜深度估计,适配任意相机 | depth estimation foundation model | ✅ | |
| 9 | Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events | 提出MuDet多模态协作网络,解决大规模灾害事件中密集遮挡车辆的检测问题。 | height map multimodal | ✅ | |
| 10 | Dynamic NeRF: A Review | 综述动态NeRF:回顾2021-2023年动态场景三维重建与表示的关键技术与发展。 | NeRF neural radiance field | ||
| 11 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | 提出基于残差的密集点网络RDPN6D,用于RGB-D图像的6DoF物体姿态估计 | 6D pose estimation | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation | SciFIBench:用于科学图表理解的大型多模态模型评测基准 | multimodal | ||
| 13 | Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | Hunyuan-DiT:一种强大的多分辨率扩散Transformer,具备精细的中文理解能力 | large language model multimodal | ||
| 14 | VS-Assistant: Versatile Surgery Assistant on the Demand of Surgeons | 提出VS-Assistant,利用多模态大语言模型实现多功能外科手术辅助 | large language model multimodal | ||
| 15 | Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video | 提出一种基于语言引导的自监督视频摘要方法,利用文本语义匹配和视频多样性优化。 | large language model |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models | MetaFruit:构建农业领域基础模型,提升机器人采摘水果的泛化性与精度 | manipulation foundation model | ||
| 17 | Learning Correspondence for Deformable Objects | 针对可变形物体,提出基于学习的像素级对应关系方法,提升机器人操作性能。 | manipulation feature matching | ||
| 18 | Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method | 提出语义上下文人脸伪造定义与检测方法,并构建大规模数据集 | manipulation |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Filtering After Shading With Stochastic Texture Filtering | 提出基于随机纹理滤波的后着色滤波方法,提升渲染图像质量。 | spatiotemporal | ||
| 20 | Enhancing Blind Video Quality Assessment with Rich Quality-aware Features | 提出RQ-VQA,利用多源质量感知特征增强盲视频质量评估,提升泛化性 | spatiotemporal |