cs.CV(2025-01-28)
📊 共 16 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (5)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (4)
支柱四:生成式动作 (Generative Motion) (1)
支柱五:交互与反应 (Interaction & Reaction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models | 提出Beyond-Labels,利用视觉-语言模型提升开放词汇语义分割性能 | open-vocabulary open vocabulary foundation model | ||
| 2 | Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection | 提出VMCNet以解决开放词汇物体检测中的表示不足问题 | open-vocabulary open vocabulary | ||
| 3 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | 提出CoSPaL,解决弱监督时空视频定位中复杂查询理解和时序一致性问题。 | scene understanding foundation model multimodal | ||
| 4 | Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds | 评估CrowdSplat:高斯人群的感知细节层次,优化实时渲染。 | 3D gaussian splatting gaussian splatting splatting | ||
| 5 | Image Velocimetry using Direct Displacement Field estimation with Neural Networks for Fluids | 提出一种基于神经网络直接估计位移场的图像测速方法,提升流体速度场空间分辨率。 | optical flow |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | VidSole: A Multimodal Dataset for Joint Kinetics Quantification and Disease Detection with Deep Learning | VidSole:构建多模态数据集,结合深度学习进行步态疾病检测与动力学量化。 | MAE multimodal | ||
| 7 | FedEFM: Federated Endovascular Foundation Model with Unseen Data | FedEFM:用于血管内手术的联邦学习基础模型,解决未见数据问题 | distillation foundation model | ||
| 8 | A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts | 提出对比师生框架,解决风格迁移下的新颖性检测问题 | teacher-student distillation | ||
| 9 | Adversarial Masked Autoencoder Purifier with Defense Transferability | 提出基于掩码自编码器的对抗样本净化器MAEP,提升防御迁移能力。 | masked autoencoder MAE | ||
| 10 | CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors | 提出类别语义先验对比学习(CSPCL),提升Deformable DETR在X光违禁品检测中的性能 | contrastive learning | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding | 提出Stackable Temporal Encoder以解决视频理解中的时间建模问题 | large language model multimodal | ||
| 12 | Molecular-driven Foundation Model for Oncologic Pathology | Threads:基于分子驱动的肿瘤病理学Foundation Model,实现全切片图像通用表征。 | foundation model multimodal | ||
| 13 | Ultra-high resolution multimodal MRI densely labelled holistic structural brain atlas | 构建基于多模态超高分辨率MRI的人脑整体结构图谱,用于提升神经系统疾病早期检测。 | multimodal | ||
| 14 | One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning | 提出Block-LoRA,一种基于分块矩阵的低秩适配方法,用于提升CLIP模型在少样本学习中的效率。 | foundation model |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation | 提出FlexMotion以解决人类运动生成的效率与可控性问题 | motion synthesis motion generation physically plausible |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing | B-RIGHT:用于广义人-物交互测试的完整性基准再评估 | human-object interaction HOI |