cs.CV(2025-10-11)

📊 共 12 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱二:RL算法与架构 (RL & Architecture) (2) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
1 Vision4PPG: Emergent PPG Analysis Capability of Vision Foundation Models for Vital Signs like Blood Pressure Vision4PPG:利用视觉基础模型进行PPG分析,实现血压等生命体征的预测 foundation model
2 ESCA: Contextualizing Embodied Agents via Scene-Graph Generation 提出ESCA框架,通过场景图生成增强具身智能体的上下文感知能力 large language model foundation model
3 From Generic to Specialized: A Subspecialty Diagnostic System Powered by Self-Supervised Learning for Cervical Histopathology CerS-Path:基于自监督学习的宫颈组织病理亚专科诊断系统 foundation model multimodal
4 EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection 提出EditCast3D以解决3D编辑中的一致性和效率问题 foundation model
5 From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries FactoredScenes:通过学习程序库生成可分解的真实世界场景,解决数据稀缺问题。 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
6 Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting 提出基于不透明度梯度的密度控制方法,提升少样本3D高斯溅射的效率和紧凑性。 3D gaussian splatting 3DGS gaussian splatting
7 Ordinal Scale Traffic Congestion Classification with Multi-Modal Vision-Language and Motion Analysis 提出一种多模态交通拥堵等级分类框架,融合视觉-语言和运动分析。 open-vocabulary open vocabulary multimodal
8 Ortho-Fuse: Orthomosaic Generation for Sparse High-Resolution Crop Health Maps Through Intermediate Optical Flow Estimation Ortho-Fuse:通过光流估计为稀疏高分辨率作物健康地图生成正射影像 optical flow
9 B2N3D: Progressive Learning from Binary to N-ary Relationships for 3D Object Grounding 提出B2N3D框架,通过二元到N元关系渐进学习实现更精确的3D物体定位 scene understanding spatial relationship

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
10 Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking 提出DualViewDistill,利用基础模型引导的BEV地图提升3D目标检测与跟踪性能。 distillation foundation model
11 SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation 提出SaFiRe框架,利用Mamba解决指代图像分割中复杂表达式的难题。 Mamba

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
12 Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging? 视频模型展现医学影像零样本学习能力,为医学基础模型奠定基础 motion prediction foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页