cs.CV(2025-04-13)
📊 共 12 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (3 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (3)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model | 提出SegEarth-R1,通过大语言模型实现地理空间像素推理,解决遥感图像的复杂查询问题。 | large language model | ✅ | |
| 2 | Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding | Ges3ViG:融合指向手势的语言3D视觉定位,提升具身引用理解 | visual grounding | ✅ | |
| 3 | Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention | 提出基于CLIP和多头注意力机制的视频片头片尾自动检测方法,提升内容理解效率。 | multimodal | ||
| 4 | Low-Light Image Enhancement using Event-Based Illumination Estimation | RetinEV:利用事件相机的时间映射事件进行低光照图像增强 | TAMP |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting | TextSplat:文本引导的语义融合,提升可泛化高斯溅射重建效果 | gaussian splatting splatting geometric consistency | ||
| 6 | Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation | 系统性评测视觉-语言模型在目标检测与分割任务中的性能与局限性 | open-vocabulary open vocabulary foundation model | ✅ | |
| 7 | DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering | DropoutGS:通过高斯随机丢弃提升稀疏视角下的3DGS渲染效果 | 3D gaussian splatting 3DGS gaussian splatting | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Sparse Deformable Mamba for Hyperspectral Image Classification | 提出稀疏可变形Mamba模型,提升高光谱图像分类精度与效率。 | Mamba HSI | ||
| 9 | ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps | 提出ERL-MPP框架,解决带侵蚀间隙的大规模拼图难题。 | reinforcement learning | ||
| 10 | FSSUAVL: A Discriminative Framework using Vision Models for Federated Self-Supervised Audio and Image Understanding | 提出FSSUAVL,利用联邦自监督学习解决非配对音频图像理解问题 | contrastive learning multimodal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler | 提出EmbodiedOcc++以解决室内三维占用预测中的几何特征不足问题 | geometric consistency | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution | 提出FVOS以解决复杂场景下视频目标分割问题 | spatiotemporal |