cs.CV(2025-05-02)
📊 共 15 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (3)
支柱三:空间感知与语义 (Perception & Semantics) (2 🔗2)
支柱四:生成式动作 (Generative Motion) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱一:机器人控制 (Robot Control) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | PainFormer: a Vision Foundation Model for Automatic Pain Assessment | 提出PainFormer以解决自动疼痛评估问题 | foundation model multimodal | ✅ | |
| 2 | Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs | 提出NeaR方法以解决无词汇细粒度视觉识别问题 | large language model multimodal | ||
| 3 | Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation | 提出跨注意力变换器方法以解决自主海洋导航中的多模态传感器融合问题 | multimodal | ||
| 4 | Grounding Task Assistance with Multimodal Cues from a Single Demonstration | 提出MICA框架以解决任务辅助中的多模态信息缺失问题 | multimodal | ||
| 5 | Multimodal Doctor-in-the-Loop: A Clinically-Guided Explainable Framework for Predicting Pathological Response in Non-Small Cell Lung Cancer | 提出多模态医生参与框架以预测非小细胞肺癌的病理反应 | multimodal | ||
| 6 | Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging | 基于基础模型的肺肿瘤分割方法显著提升准确性与效率 | foundation model | ||
| 7 | Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation | 提出多模态X光影像与报告生成框架以解决医疗数据生成问题 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning | 提出一种传感器无关的领域泛化框架以提升遥感语义分割性能 | MAE foundation model | ||
| 9 | FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing | 提出FlowDubber以解决电影配音中的音频质量与口型同步问题 | flow matching large language model | ||
| 10 | CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment | 提出CAV-MAE Sync以解决音视频模态对齐问题 | MAE |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting | 提出一种方法以解决在线动态3D重建中的时空不一致问题 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 12 | Learning Flow-Guided Registration for RGB-Event Semantic Segmentation | 提出BRENet以解决RGB-Event语义分割中的配准问题 | optical flow spatiotemporal | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | TSTMotion: Training-free Scene-aware Text-to-motion Generation | 提出TSTMotion以解决场景感知文本到动作生成问题 | text-to-motion text-driven motion motion generation |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors | 提出FreeInsert以解决无空间先验的3D场景对象插入问题 | spatial relationship foundation model |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models | 提出VidStamp以解决视频生成模型中的水印问题 | manipulation | ✅ |