cs.CV(2025-12-20)

📊 共 12 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 Pyramidal Adaptive Cross-Gating for Multimodal Detection PACGNet:针对无人机图像多模态目标检测的金字塔自适应交叉门控网络 multimodal
2 UniMPR: A Unified Framework for Multimodal Place Recognition with Heterogeneous Sensor Configurations UniMPR:异构传感器配置下多模态地点识别的统一框架 multimodal
3 Atlas is Your Perfect Context: One-Shot Customization for Generalizable Foundational Medical Image Segmentation AtlasSegFM:利用单样本定制化通用医学图像分割基础模型 foundation model multimodal
4 Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models 提出Adaptive-VoCo,通过自适应视觉token压缩提升视觉语言模型效率。 multimodal
5 Through the PRISm: Importance-Aware Scene Graphs for Image Retrieval 提出PRISm框架,通过重要性预测场景图实现更精准的图像检索 multimodal
6 Automated Mosaic Tesserae Segmentation via Deep Learning Techniques 利用深度学习技术自动分割马赛克镶嵌图案,实现文化遗产数字化。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
7 EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams 提出EndoStreamDepth以解决内窥镜视频流中的单目深度估计问题 Mamba depth estimation monocular depth
8 MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation MACE-Dance:基于级联专家混合模型的音乐驱动舞蹈视频生成框架 Mamba motion generation human motion
9 Is There a Better Source Distribution than Gaussian? Exploring Source Distributions for Image Flow Matching 探索图像流匹配更优源分布:提出方向剪枝采样提升生成质量 flow matching

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
10 MatSpray: Fusing 2D Material World Knowledge on 3D Geometry MatSpray:融合2D材质知识于3D几何,提升重建场景真实感 gaussian splatting splatting
11 Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction 提出联合学习框架,解决大规模单目3D重建中深度、位姿和辐射场的耦合难题。 NeRF

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
12 RecurGS: Interactive Scene Modeling via Discrete-State Recurrent Gaussian Fusion RecurGS:通过离散状态循环高斯融合实现交互式场景建模 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页