cs.CV(2026-02-22)
📊 共 24 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (10 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Universal 3D Shape Matching via Coarse-to-Fine Language Guidance | 提出UniMatch,通过粗到细的语言引导实现通用3D形状匹配。 | contrastive learning large language model multimodal | ||
| 12 | GUIDE-US: Grade-Informed Unpaired Distillation of Encoder Knowledge from Histopathology to Micro-UltraSound | 提出GUIDE-US,利用非配对组织病理学知识蒸馏提升微超声前列腺癌分级性能。 | distillation foundation model | ||
| 13 | MRI Contrast Enhancement Kinetics World Model | 提出MRI CEKWorld模型,通过时空一致性学习提升MRI对比增强动态模拟效果 | world model spatiotemporal | ✅ | |
| 14 | GS-CLIP: Zero-shot 3D Anomaly Detection by Geometry-Aware Prompt and Synergistic View Representation Learning | GS-CLIP:基于几何感知提示和协同视图表示学习的零样本3D异常检测 | representation learning distillation | ✅ | |
| 15 | JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation | JavisDiT++:提出统一建模与优化框架,用于高质量联合音视频生成。 | DPO direct preference optimization multimodal | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | DefenseSplat: Enhancing the Robustness of 3D Gaussian Splatting via Frequency-Aware Filtering | DefenseSplat:通过频率感知滤波增强3D高斯溅射的鲁棒性 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 17 | OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness | OpenVO:提出一种具有时间动态感知的开放世界视觉里程计框架 | visual odometry foundation model | ||
| 18 | TeFlow: Enabling Multi-frame Supervision for Self-Supervised Feed-forward Scene Flow Estimation | TeFlow:通过时序一致性监督,提升自监督前馈场景流估计性能 | scene flow | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction | 提出循环一致掩码预测方法,解决跨视角物体对应问题 | egocentric | ✅ | |
| 20 | Keep it SymPL: Symbolic Projective Layout for Allocentric Spatial Reasoning in Vision-Language Models | 提出SymPL框架,解决视觉-语言模型中以客体为中心的空间推理难题 | egocentric spatial relationship |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery | 提出VLM引导的群体偏好对齐框架,提升扩散模型人体网格重建的真实性和一致性 | physically plausible human mesh recovery HMR |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | Controlled Face Manipulation and Synthesis for Data Augmentation | 提出基于扩散自编码器的可控人脸操纵方法,用于数据增强以提升表情识别性能。 | manipulation |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery | FUSAR-GPT:面向SAR影像,时空特征嵌入与解耦的两阶段视觉语言模型 | spatiotemporal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling | Ani3DHuman:结合运动学与扩散先验的逼真3D人体动画生成 | motion representation | ✅ |