cs.CV（2025-05-02）

📊 共 15 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (3) 支柱三：空间感知与语义 (Perception & Semantics) (2 🔗2) 支柱四：生成式动作 (Generative Motion) (1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱一：机器人控制 (Robot Control) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	PainFormer: a Vision Foundation Model for Automatic Pain Assessment	提出PainFormer以解决自动疼痛评估问题	foundation model multimodal	✅
2	Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs	提出NeaR方法以解决无词汇细粒度视觉识别问题	large language model multimodal
3	Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation	提出跨注意力变换器方法以解决自主海洋导航中的多模态传感器融合问题	multimodal
4	Grounding Task Assistance with Multimodal Cues from a Single Demonstration	提出MICA框架以解决任务辅助中的多模态信息缺失问题	multimodal
5	Multimodal Doctor-in-the-Loop: A Clinically-Guided Explainable Framework for Predicting Pathological Response in Non-Small Cell Lung Cancer	提出多模态医生参与框架以预测非小细胞肺癌的病理反应	multimodal
6	Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging	基于基础模型的肺肿瘤分割方法显著提升准确性与效率	foundation model
7	Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation	提出多模态X光影像与报告生成框架以解决医疗数据生成问题	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
8	A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning	提出一种传感器无关的领域泛化框架以提升遥感语义分割性能	MAE foundation model
9	FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing	提出FlowDubber以解决电影配音中的音频质量与口型同步问题	flow matching large language model
10	CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment	提出CAV-MAE Sync以解决音视频模态对齐问题	MAE

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting	提出一种方法以解决在线动态3D重建中的时空不一致问题	3D gaussian splatting gaussian splatting splatting	✅
12	Learning Flow-Guided Registration for RGB-Event Semantic Segmentation	提出BRENet以解决RGB-Event语义分割中的配准问题	optical flow spatiotemporal	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
13	TSTMotion: Training-free Scene-aware Text-to-motion Generation	提出TSTMotion以解决场景感知文本到动作生成问题	text-to-motion text-driven motion motion generation

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
14	FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors	提出FreeInsert以解决无空间先验的3D场景对象插入问题	spatial relationship foundation model

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models	提出VidStamp以解决视频生成模型中的水印问题	manipulation	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页