cs.CV(2024-09-30)

📊 共 29 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (13 🔗5) 支柱三:空间感知与语义 (Perception & Semantics) (8 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)

#题目一句话要点标签🔗
1 VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection 提出VMAD:视觉增强的多模态大语言模型用于零样本异常检测 large language model multimodal
2 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning MM1.5:通过数据驱动的多模态LLM微调提升图像理解与多图像推理能力 large language model multimodal
3 Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval 提出LECCR,利用多模态LLM增强跨语言跨模态检索中的视觉和非英语表示对齐。 large language model multimodal
4 Multimodal Alignment of Histopathological Images Using Cell Segmentation and Point Set Matching for Integrative Cancer Analysis 提出基于细胞分割和点集匹配的多模态组织病理图像配准方法,用于癌症整合分析。 multimodal
5 AI Foundation Model for Heliophysics: Applications, Design, and Implementation 面向日球物理学设计AI基础模型,利用SDO数据集探索应用 foundation model
6 OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection OpenKD:开放提示多样性,实现零样本和少样本关键点检测 large language model foundation model multimodal
7 Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration UniKE:通过增强知识协作实现统一的多模态编辑 multimodal
8 World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering 提出World to Code,通过自指导组合式描述和过滤生成高质量多模态数据,提升视觉语言模型性能。 multimodal visual grounding
9 Visual Context Window Extension: A New Perspective for Long Video Understanding 提出视觉上下文窗口扩展方法,解决大模型在长视频理解中的难题 large language model multimodal
10 Exploring Social Media Image Categorization Using Large Models with Different Adaptation Methods: A Case Study on Cultural Nature's Contributions to People 提出FLIPS数据集,并探索大模型在社交媒体图像分类中的应用,聚焦文化自然贡献 large language model
11 MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans 提出MM-Conv多模态对话数据集,用于增强虚拟人协同姿势生成。 multimodal
12 Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos VidAssist:利用LLM进行教学视频中面向目标的规划 large language model
13 ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer 提出ACE:基于扩散Transformer的通用图像生成与编辑模型 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
14 Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels 提出PixelCLIP,利用无语义标签图像实现开放词汇语义分割 open-vocabulary open vocabulary foundation model
15 Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation 提出PeskaVLP框架,通过层级知识增强解决手术视频-语言预训练中的知识鸿沟和时空对齐问题 scene understanding large language model zero-shot transfer
16 SuperPose: Improved 6D Pose Estimation with Robust Tracking and Mask-Free Initialization SuperPose:通过鲁棒跟踪和无掩码初始化改进6D位姿估计 6D pose estimation feature matching
17 OPONeRF: One-Point-One NeRF for Robust Neural Rendering 提出OPONeRF,通过个性化参数和不确定性建模,提升NeRF在动态场景下的鲁棒渲染能力。 NeRF
18 CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability CCDepth:一种轻量级、可解释性增强的自监督深度估计网络 depth estimation
19 Active Neural Mapping at Scale 提出基于NeRF的主动神经地图构建系统,用于高效探索大规模室内环境。 NeRF
20 Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation 提出类无关时序视觉网络CAVT,用于场景草图的语义分割,并构建了大规模数据集FrISS。 scene understanding
21 DressRecon: Freeform 4D Human Reconstruction from Monocular Video DressRecon:单目视频中自由形态4D人体重建,适用于宽松服装和物体交互场景 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
22 Survival Prediction in Lung Cancer through Multi-Modal Representation Learning 提出一种多模态表征学习方法,用于肺癌生存预测,融合CT、PET和基因组数据。 predictive model representation learning
23 Domain Consistency Representation Learning for Lifelong Person Re-Identification 提出域一致性表征学习(DCR)模型,解决终身ReID中域内区分性和域间一致性的矛盾。 representation learning distillation
24 Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies 提出ClassroomKD,一种自适应多导师知识蒸馏框架,提升学生模型性能。 distillation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
25 REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke REST-HANDS:利用智能眼镜和第一视角视觉进行中风后手部康复 egocentric egocentric vision
26 HEADS-UP: Head-Mounted Egocentric Dataset for Trajectory Prediction in Blind Assistance Systems 提出HEADS-UP数据集,用于盲人辅助系统中基于头戴相机的轨迹预测 egocentric

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
27 TSdetector: Temporal-Spatial Self-correction Collaborative Learning for Colonoscopy Video Detection 提出TSdetector以解决结肠镜视频中多发性息肉检测问题 spatial relationship
28 SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers 提出SATA:利用空间自相关性提升Vision Transformer的鲁棒性 spatial relationship

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
29 Masked Autoregressive Model for Weather Forecasting 提出MAM4WF模型,结合掩码建模与自回归预测,提升长期天气预测精度。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页