cs.CV(2025-09-07)
📊 共 10 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (2 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | MEGS$^{2}$: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning | MEGS$^{2}$: 通过球谐高斯和统一剪枝实现内存高效的高斯溅射 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 2 | Light-Weight Cross-Modal Enhancement Method with Benchmark Construction for UAV-based Open-Vocabulary Object Detection | 针对无人机开放词汇目标检测,提出轻量级跨模态增强方法与基准数据集。 | open-vocabulary open vocabulary | ||
| 3 | Motion Aware ViT-based Framework for Monocular 6-DoF Spacecraft Pose Estimation | 提出一种基于运动感知的ViT框架,用于单目6自由度航天器姿态估计 | optical flow | ||
| 4 | S-LAM3D: Segmentation-Guided Monocular 3D Object Detection via Feature Space Fusion | S-LAM3D:通过特征空间融合的分割引导单目3D目标检测 | depth estimation |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation | MedSeqFT:提出序列化微调框架,提升医学影像分割Foundation Model在增量任务中的性能。 | distillation foundation model | ||
| 6 | A Fine-Grained Attention and Geometric Correspondence Model for Musculoskeletal Risk Classification in Athletes Using Multimodal Visual and Skeletal Features | ViSK-GAT:融合视觉与骨骼特征,实现运动员肌肉骨骼风险精准分类 | MAE multimodal | ||
| 7 | Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching | 提出系数保持采样(CPS)方法,解决Flow Matching模型RL优化中的噪声伪影问题。 | reinforcement learning flow matching | ✅ | |
| 8 | UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning | UNO:提出统一的单阶段视频场景图生成框架,通过对象中心视觉表征学习同时处理box-level和pixel-level任务。 | representation learning |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Compression Beyond Pixels: Semantic Compression with Multimodal Foundation Models | 提出基于多模态大模型的语义压缩方法,超越像素级重建。 | foundation model multimodal | ||
| 10 | BTCChat: Advancing Remote Sensing Bi-temporal Change Captioning with Multimodal Large Language Model | BTCChat:利用多模态大语言模型提升遥感双时相变化描述能力 | large language model multimodal | ✅ |