cs.CV(2025-04-27)
📊 共 12 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (3)
支柱一:机器人控制 (Robot Control) (1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | OpenFusion++: An Open-vocabulary Real-time Scene Understanding System | 提出OpenFusion++,实现开放词汇实时场景理解,提升3D感知的精度和响应速度。 | scene understanding open-vocabulary open vocabulary | ||
| 2 | Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting | 提出基于可渲染性场引导的高斯溅射方法,提升场景视角合成的渲染稳定性。 | gaussian splatting splatting | ||
| 3 | IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos | 提出IM-Portrait,一种基于单目视频的3D感知视频扩散方法,用于生成逼真的说话人头部视频。 | NeRF geometric consistency | ||
| 4 | Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection | 提出一种融合多模态显著性和单目深度信息的注视目标检测方法 | depth estimation monocular depth |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease | HoloDx:融合知识与数据的多模态阿尔茨海默病诊断框架 | large language model multimodal | ||
| 6 | MERA: Multimodal and Multiscale Self-Explanatory Model with Considerably Reduced Annotation for Lung Nodule Diagnosis | MERA:一种低标注需求的多模态多尺度自解释肺结节诊断模型 | multimodal | ✅ | |
| 7 | DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning | 提出DeepSPG以解决低光照图像增强中的语义信息缺失问题 | multimodal | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding | DeepInsert:通过早期层旁路提升多模态理解的效率与性能 | representation learning multimodal | ||
| 9 | CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis | 提出CARL,实现相机无关的光谱图像表征学习,提升跨相机泛化性。 | representation learning scene understanding foundation model | ||
| 10 | Learning to Drive from a World Model | 提出基于世界模型的端到端自动驾驶学习框架,无需人工规则。 | world model |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes | CapsFake:提出多模态胶囊网络,用于检测指令引导的深度伪造图像。 | manipulation multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions | 生成式AI赋能角色动画:全面综述技术、应用与未来方向 | motion synthesis character animation | ✅ |