cs.CV(2025-01-28)

📊 共 16 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (5) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (4) 支柱四:生成式动作 (Generative Motion) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
1 Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models 提出Beyond-Labels,利用视觉-语言模型提升开放词汇语义分割性能 open-vocabulary open vocabulary foundation model
2 Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection 提出VMCNet以解决开放词汇物体检测中的表示不足问题 open-vocabulary open vocabulary
3 Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding 提出CoSPaL,解决弱监督时空视频定位中复杂查询理解和时序一致性问题。 scene understanding foundation model multimodal
4 Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds 评估CrowdSplat:高斯人群的感知细节层次,优化实时渲染。 3D gaussian splatting gaussian splatting splatting
5 Image Velocimetry using Direct Displacement Field estimation with Neural Networks for Fluids 提出一种基于神经网络直接估计位移场的图像测速方法,提升流体速度场空间分辨率。 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
6 VidSole: A Multimodal Dataset for Joint Kinetics Quantification and Disease Detection with Deep Learning VidSole:构建多模态数据集,结合深度学习进行步态疾病检测与动力学量化。 MAE multimodal
7 FedEFM: Federated Endovascular Foundation Model with Unseen Data FedEFM:用于血管内手术的联邦学习基础模型,解决未见数据问题 distillation foundation model
8 A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts 提出对比师生框架,解决风格迁移下的新颖性检测问题 teacher-student distillation
9 Adversarial Masked Autoencoder Purifier with Defense Transferability 提出基于掩码自编码器的对抗样本净化器MAEP,提升防御迁移能力。 masked autoencoder MAE
10 CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors 提出类别语义先验对比学习(CSPCL),提升Deformable DETR在X光违禁品检测中的性能 contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
11 Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding 提出Stackable Temporal Encoder以解决视频理解中的时间建模问题 large language model multimodal
12 Molecular-driven Foundation Model for Oncologic Pathology Threads:基于分子驱动的肿瘤病理学Foundation Model,实现全切片图像通用表征。 foundation model multimodal
13 Ultra-high resolution multimodal MRI densely labelled holistic structural brain atlas 构建基于多模态超高分辨率MRI的人脑整体结构图谱,用于提升神经系统疾病早期检测。 multimodal
14 One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning 提出Block-LoRA,一种基于分块矩阵的低秩适配方法,用于提升CLIP模型在少样本学习中的效率。 foundation model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
15 FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation 提出FlexMotion以解决人类运动生成的效率与可控性问题 motion synthesis motion generation physically plausible

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
16 B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing B-RIGHT:用于广义人-物交互测试的完整性基准再评估 human-object interaction HOI

⬅️ 返回 cs.CV 首页 · 🏠 返回主页