cs.CV(2024-06-27)

📊 共 17 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale HuatuoGPT-Vision:通过注入大规模医学视觉知识提升多模态LLM的医学能力 large language model multimodal
2 DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming DocKylin:一种高效视觉精简的大型多模态文档理解模型 large language model multimodal
3 ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos ReXTime:一个用于视频中跨时间推理的基准测试套件 large language model multimodal
4 RAVEN: Multitask Retrieval Augmented Vision-Language Learning RAVEN:多任务检索增强的视觉-语言学习框架,提升VLM性能。 large language model multimodal
5 OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding OMG-LLaVA:融合图像、对象和像素级推理与理解的多模态模型 multimodal
6 CELLO: Causal Evaluation of Large Vision-Language Models 提出CELLO以解决大规模视觉-语言模型因果推理问题 chain-of-thought
7 Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift 针对大视觉语言模型,提出域泛化多模态后门攻击方法MABA,提升攻击成功率。 multimodal
8 ViT LoS V2X: Vision Transformers for Environment-aware LoS Blockage Prediction for 6G Vehicular Networks 提出基于视觉Transformer的V2X环境感知LoS阻塞预测方法,用于6G车载网络。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
9 Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation 提出模态感知特征蒸馏方法,提升视觉问答持续学习性能 distillation multimodal
10 Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model 提出RWKV-SAM:一种高效且高质量的Segment Anything模型 Mamba linear attention
11 Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads Fibottention:利用多样化注意力头的Inception式视觉表征学习,提升Transformer在有限数据下的性能。 representation learning
12 Snakes and Ladders: Two Steps Up for VideoMamba VideoMambaPro:通过改进Mamba架构,提升视频理解性能。 Mamba

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
13 Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach 提出一种基于光流和伪深度图的零样本单目运动分割方法 depth estimation monocular depth optical flow
14 A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow 提出基于半监督分割和光流的通用铁路障碍物检测系统,解决类别泛化难题。 optical flow
15 360 in the Wild: Dataset for Depth Prediction and View Synthesis 提出大规模真实场景360°视频数据集,用于深度预测和视角合成研究 depth estimation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
16 CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement CORE4D:用于协同物体重排列的4D人-物-人交互数据集 human-object interaction human motion

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
17 PNeRV: A Polynomial Neural Representation for Videos 提出PNeRV,一种用于视频的参数高效多项式神经表示,保持时空连续性。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页