cs.CV（2025-10-11）

📊 共 12 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (4) 支柱二：RL算法与架构 (RL & Architecture) (2) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Vision4PPG: Emergent PPG Analysis Capability of Vision Foundation Models for Vital Signs like Blood Pressure	Vision4PPG：利用视觉基础模型进行PPG分析，实现血压等生命体征的预测	foundation model
2	ESCA: Contextualizing Embodied Agents via Scene-Graph Generation	提出ESCA框架，通过场景图生成增强具身智能体的上下文感知能力	large language model foundation model	✅
3	From Generic to Specialized: A Subspecialty Diagnostic System Powered by Self-Supervised Learning for Cervical Histopathology	CerS-Path：基于自监督学习的宫颈组织病理亚专科诊断系统	foundation model multimodal
4	EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection	提出EditCast3D以解决3D编辑中的一致性和效率问题	foundation model	✅
5	From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries	FactoredScenes：通过学习程序库生成可分解的真实世界场景，解决数据稀缺问题。	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting	提出基于不透明度梯度的密度控制方法，提升少样本3D高斯溅射的效率和紧凑性。	3D gaussian splatting 3DGS gaussian splatting
7	Ordinal Scale Traffic Congestion Classification with Multi-Modal Vision-Language and Motion Analysis	提出一种多模态交通拥堵等级分类框架，融合视觉-语言和运动分析。	open-vocabulary open vocabulary multimodal
8	Ortho-Fuse: Orthomosaic Generation for Sparse High-Resolution Crop Health Maps Through Intermediate Optical Flow Estimation	Ortho-Fuse：通过光流估计为稀疏高分辨率作物健康地图生成正射影像	optical flow
9	B2N3D: Progressive Learning from Binary to N-ary Relationships for 3D Object Grounding	提出B2N3D框架，通过二元到N元关系渐进学习实现更精确的3D物体定位	scene understanding spatial relationship

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
10	Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking	提出DualViewDistill，利用基础模型引导的BEV地图提升3D目标检测与跟踪性能。	distillation foundation model
11	SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation	提出SaFiRe框架，利用Mamba解决指代图像分割中复杂表达式的难题。	Mamba

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging?	视频模型展现医学影像零样本学习能力，为医学基础模型奠定基础	motion prediction foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页