cs.CV(2025-10-11)
📊 共 12 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱二:RL算法与架构 (RL & Architecture) (2)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Vision4PPG: Emergent PPG Analysis Capability of Vision Foundation Models for Vital Signs like Blood Pressure | Vision4PPG:利用视觉基础模型进行PPG分析,实现血压等生命体征的预测 | foundation model | ||
| 2 | ESCA: Contextualizing Embodied Agents via Scene-Graph Generation | 提出ESCA框架,通过场景图生成增强具身智能体的上下文感知能力 | large language model foundation model | ✅ | |
| 3 | From Generic to Specialized: A Subspecialty Diagnostic System Powered by Self-Supervised Learning for Cervical Histopathology | CerS-Path:基于自监督学习的宫颈组织病理亚专科诊断系统 | foundation model multimodal | ||
| 4 | EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection | 提出EditCast3D以解决3D编辑中的一致性和效率问题 | foundation model | ✅ | |
| 5 | From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries | FactoredScenes:通过学习程序库生成可分解的真实世界场景,解决数据稀缺问题。 | large language model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting | 提出基于不透明度梯度的密度控制方法,提升少样本3D高斯溅射的效率和紧凑性。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 7 | Ordinal Scale Traffic Congestion Classification with Multi-Modal Vision-Language and Motion Analysis | 提出一种多模态交通拥堵等级分类框架,融合视觉-语言和运动分析。 | open-vocabulary open vocabulary multimodal | ||
| 8 | Ortho-Fuse: Orthomosaic Generation for Sparse High-Resolution Crop Health Maps Through Intermediate Optical Flow Estimation | Ortho-Fuse:通过光流估计为稀疏高分辨率作物健康地图生成正射影像 | optical flow | ||
| 9 | B2N3D: Progressive Learning from Binary to N-ary Relationships for 3D Object Grounding | 提出B2N3D框架,通过二元到N元关系渐进学习实现更精确的3D物体定位 | scene understanding spatial relationship |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking | 提出DualViewDistill,利用基础模型引导的BEV地图提升3D目标检测与跟踪性能。 | distillation foundation model | ||
| 11 | SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation | 提出SaFiRe框架,利用Mamba解决指代图像分割中复杂表达式的难题。 | Mamba |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging? | 视频模型展现医学影像零样本学习能力,为医学基础模型奠定基础 | motion prediction foundation model |