cs.CV(2025-01-15)
📊 共 21 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗3)
支柱七:动作重定向 (Motion Retargeting) (2)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation | 提出FATE-SAM,实现免训练的3D医学图像分割小样本自适应 | foundation model | ||
| 2 | Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | 提出ArtCoT,提升多模态LLM在零样本美学推理中的表现 | multimodal | ✅ | |
| 3 | Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities | 探讨时空基础模型:分析其愿景、挑战与机遇,旨在推动更广泛应用。 | foundation model | ||
| 4 | Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation | DETRIS:面向Referring Image Segmentation,提出密集连接的参数高效微调框架 | foundation model multimodal | ✅ | |
| 5 | Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures | 提出一种统一的混凝土裂缝少样本分割与精确3D自动测量方法 | foundation model | ||
| 6 | IDEA: Image Description Enhanced CLIP-Adapter | 提出IDEA:一种图像描述增强的CLIP-Adapter,用于提升小样本图像分类性能。 | multimodal | ✅ | |
| 7 | RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency | RealVVT:通过时空一致性实现逼真的视频虚拟试穿 | foundation model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Vision Foundation Models for Computed Tomography | 提出CT-FM:基于大规模CT扫描的医学影像分割与理解的视觉基础模型 | contrastive learning foundation model | ||
| 15 | FlexiClip: Locality-Preserving Free-Form Character Animation | FlexiClip:提出局部性保持的自由形式卡通角色动画方法,提升动画质量。 | flow matching character animation | ✅ | |
| 16 | Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation | 提出视角感知教学框架,实现异构架构间知识蒸馏 | distillation | ✅ | |
| 17 | MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation | 提出MANTA:一种基于Diffusion Mamba的高效长时密集动作预测方法 | Mamba | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | RepVideo: Rethinking Cross-Layer Representation for Video Generation | RepVideo:通过重构跨层表示增强视频生成的时间一致性和空间准确性 | spatial relationship | ||
| 19 | Computerized Assessment of Motor Imitation for Distinguishing Autism in Video (CAMI-2DNet) | 提出CAMI-2DNet,一种基于深度学习的运动模仿评估方法,用于区分自闭症患者。 | motion retargeting |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection | 提出基于Patch感知的向量量化码本学习方法,用于无监督视觉缺陷检测 | VQ-VAE |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Joint Learning of Depth and Appearance for Portrait Image Animation | 提出基于扩散模型的联合深度与外观学习框架,用于高质量人像图像动画 | manipulation |