cs.CV(2025-05-10)
📊 共 11 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱二:RL算法与架构 (RL & Architecture) (1)
支柱四:生成式动作 (Generative Motion) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection | 提出METOR框架,用于开放词汇视频视觉关系检测中的对象与关系互增强 | open-vocabulary open vocabulary | ||
| 2 | Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation | 提出因果提示校准的CPC-SAM模型,解决SAM在开放词汇多实体分割中的泛化问题。 | open-vocabulary open vocabulary | ||
| 3 | Edge-Enabled VIO with Long-Tracked Features for High-Accuracy Low-Altitude IoT Navigation | 提出基于长时跟踪特征的边缘VIO,提升低空IoT导航精度与实时性 | VIO | ||
| 4 | ElectricSight: 3D Hazard Monitoring for Power Lines Using Low-Cost Sensors | ElectricSight:利用低成本传感器实现输电线路的3D危险监测 | depth estimation monocular depth |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning | 提出基于单模态微调的批量增强方法,用于多模态学习,提升超声图像胎儿器官检测性能。 | large language model multimodal | ||
| 6 | TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition | 提出TACFN,利用Transformer自适应跨模态融合进行多模态情感识别 | multimodal | ✅ | |
| 7 | Improving Generalization of Medical Image Registration Foundation Model | 融合SAM优化医学图像配准Foundation Model泛化性与鲁棒性 | foundation model | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | GRACE: Estimating Geometry-level 3D Human-Scene Contact from 2D Images | 提出GRACE,通过几何推理估计2D图像中人-场景交互的3D接触区域 | SMPL embodied AI |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Dataset Distillation with Probabilistic Latent Features | 提出基于概率潜在特征的数据集蒸馏方法,提升跨架构泛化性能。 | distillation |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models | HDGlyph:一种用于扩散模型中长尾文本渲染的分层解耦字形框架 | classifier-free guidance |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images | ProFashion:利用多参考图像和原型引导的时尚视频生成框架 | spatiotemporal |