cs.CV(2025-05-02)

📊 共 15 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗2) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱一:机器人控制 (Robot Control) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 PainFormer: a Vision Foundation Model for Automatic Pain Assessment PainFormer:用于自动疼痛评估的视觉基础模型 foundation model multimodal
2 Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs 提出NeaR,利用MLLM生成标签微调CLIP模型,解决无词汇精细化视觉识别问题。 large language model multimodal
3 Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation 提出基于交叉注意力Transformer的多模态融合方法,用于提升自主航海的安全性。 multimodal
4 Grounding Task Assistance with Multimodal Cues from a Single Demonstration MICA:利用单次演示中的多模态线索增强任务辅助的对话智能体 multimodal
5 Multimodal Doctor-in-the-Loop: A Clinically-Guided Explainable Framework for Predicting Pathological Response in Non-Small Cell Lung Cancer 提出多模态医生在环框架,用于预测非小细胞肺癌的病理反应。 multimodal
6 Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging 基石模型能否有效分割肿瘤?肺部CT影像分割的基准测试 foundation model
7 Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation 提出多模态X光影像与报告生成框架以解决医疗数据生成问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
8 A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning 提出一种传感器无关的领域泛化框架,利用地理空间基础模型增强语义分割 MAE foundation model
9 FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing FlowDubber:利用LLM语义感知学习和Flow Matching语音增强的电影配音方法 flow matching large language model
10 CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment CAV-MAE Sync:通过细粒度对齐改进对比音视频掩码自编码器 MAE

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
11 Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting 提出时空一致性补偿方法,解决在线动态3D高斯溅射重建中的伪影问题 3D gaussian splatting gaussian splatting splatting
12 Learning Flow-Guided Registration for RGB-Event Semantic Segmentation 提出BRENet,通过光流引导配准解决RGB-Event语义分割中的模态不对齐问题。 optical flow spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
13 TSTMotion: Training-free Scene-aware Text-to-motion Generation 提出TSTMotion,一种免训练的场景感知文本到动作生成框架 text-to-motion text-driven motion motion generation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
14 FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors FreeInsert:提出一种无需空间先验的3D高斯场景中文本引导的对象插入方法。 spatial relationship foundation model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
15 VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models VidStamp:一种时序感知的视频扩散模型水印方案,用于所有权和完整性验证 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页