cs.CV(2025-12-20)
📊 共 12 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Pyramidal Adaptive Cross-Gating for Multimodal Detection | PACGNet:针对无人机图像多模态目标检测的金字塔自适应交叉门控网络 | multimodal | ||
| 2 | UniMPR: A Unified Framework for Multimodal Place Recognition with Heterogeneous Sensor Configurations | UniMPR:异构传感器配置下多模态地点识别的统一框架 | multimodal | ✅ | |
| 3 | Atlas is Your Perfect Context: One-Shot Customization for Generalizable Foundational Medical Image Segmentation | AtlasSegFM:利用单样本定制化通用医学图像分割基础模型 | foundation model multimodal | ||
| 4 | Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models | 提出Adaptive-VoCo,通过自适应视觉token压缩提升视觉语言模型效率。 | multimodal | ||
| 5 | Through the PRISm: Importance-Aware Scene Graphs for Image Retrieval | 提出PRISm框架,通过重要性预测场景图实现更精准的图像检索 | multimodal | ||
| 6 | Automated Mosaic Tesserae Segmentation via Deep Learning Techniques | 利用深度学习技术自动分割马赛克镶嵌图案,实现文化遗产数字化。 | foundation model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams | 提出EndoStreamDepth以解决内窥镜视频流中的单目深度估计问题 | Mamba depth estimation monocular depth | ✅ | |
| 8 | MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation | MACE-Dance:基于级联专家混合模型的音乐驱动舞蹈视频生成框架 | Mamba motion generation human motion | ||
| 9 | Is There a Better Source Distribution than Gaussian? Exploring Source Distributions for Image Flow Matching | 探索图像流匹配更优源分布:提出方向剪枝采样提升生成质量 | flow matching | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | MatSpray: Fusing 2D Material World Knowledge on 3D Geometry | MatSpray:融合2D材质知识于3D几何,提升重建场景真实感 | gaussian splatting splatting | ||
| 11 | Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction | 提出联合学习框架,解决大规模单目3D重建中深度、位姿和辐射场的耦合难题。 | NeRF |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | RecurGS: Interactive Scene Modeling via Discrete-State Recurrent Gaussian Fusion | RecurGS:通过离散状态循环高斯融合实现交互式场景建模 | manipulation |