cs.CV(2026-04-17)
📊 共 37 篇论文 | 🔗 13 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (11 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (7 🔗3)
支柱一:机器人控制 (Robot Control) (5 🔗2)
支柱六:视频提取与匹配 (Video Extraction) (2 🔗1)
支柱八:物理动画 (Physics-based Animation) (2 🔗2)
支柱七:动作重定向 (Motion Retargeting) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Splats in Splats++: Robust and Generalizable 3D Gaussian Splatting Steganography | 提出Splats in Splats++以解决3D高斯点云隐写问题 | 3D gaussian splatting 3DGS 3D reconstruction | ||
| 13 | AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution | AdaVFM:通过LLM引导的自适应视觉基础模型实现边缘智能 | open-vocabulary open vocabulary large language model | ||
| 14 | Neural Gabor Splatting: Enhanced Gaussian Splatting with Neural Gabor for High-frequency Surface Reconstruction | 提出神经Gabor Splatting,增强高频表面重建的3D高斯溅射 | 3D gaussian splatting 3DGS 3D reconstruction | ||
| 15 | SENSE: Stereo OpEN Vocabulary SEmantic Segmentation | SENSE:利用立体视觉增强开放词汇语义分割,提升空间精度 | scene understanding open-vocabulary open vocabulary | ||
| 16 | CLOTH-HUGS: Cloth Aware Human Gaussian Splatting | Cloth-HUGS:基于高斯溅射的服装感知人体重建,解耦身体与服装 | gaussian splatting splatting SMPL | ||
| 17 | Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline | 提出OVRSISBenchV2以解决开放词汇遥感图像分割问题 | open-vocabulary open vocabulary | ✅ | |
| 18 | PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding | 提出PLAF,实现像素级语言对齐特征提取,提升高效3D场景理解能力 | scene understanding open-vocabulary open vocabulary | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
🔬 支柱一:机器人控制 (Robot Control) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs | 提出Mind's Eye基准测试,评估多模态LLM的视觉抽象、转换和组合能力 | manipulation large language model multimodal | ||
| 27 | DINOv3 Beats Specialized Detectors: A Simple Foundation Model Baseline for Image Forensics | 提出基于DINOv3的图像取证基线模型,性能超越专用检测器 | manipulation foundation model | ✅ | |
| 28 | Continual Hand-Eye Calibration for Open-world Robotic Manipulation | 提出一种持续手眼标定框架,解决开放世界机器人操作中的灾难性遗忘问题。 | manipulation distillation | ||
| 29 | From Competition to Coopetition: Coopetitive Training-Free Image Editing Based on Text Guidance | 提出CoEdit,通过竞争合作训练实现文本引导的免训练图像编辑 | manipulation | ✅ | |
| 30 | AHS: Adaptive Head Synthesis via Synthetic Data Augmentations | 提出AHS,通过合成数据增强实现自适应头部合成,解决现有头部替换方法在真实场景中的局限性。 | manipulation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 31 | FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation | 提出FineCog-Nav以解决无人机视觉语言导航中的零-shot挑战 | egocentric VLN foundation model | ✅ | |
| 32 | Watching Movies Like a Human: Egocentric Emotion Understanding for Embodied Companions | 提出EgoScreen-Emotion数据集,用于具身智能体在主视角屏幕观看电影时的情感理解。 | egocentric multimodal |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 33 | NeuroLip: An Event-driven Spatiotemporal Learning Framework for Cross-Scene Lip-Motion-based Visual Speaker Recognition | NeuroLip:一种事件驱动的时空学习框架,用于跨场景的唇动视觉说话人识别 | spatiotemporal | ✅ | |
| 34 | LP$^{2}$DH: A Locality-Preserving Pixel-Difference Hashing Framework for Dynamic Texture Recognition | 提出LP²DH框架以解决动态纹理识别中的高维特征问题 | spatiotemporal | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 35 | AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection | 提出AIFIND,通过伪造痕迹对齐实现增量人脸伪造检测 | geometric consistency | ||
| 36 | APC: Transferable and Efficient Adversarial Point Counterattack for Robust 3D Point Cloud Recognition | 提出对抗点云反击(APC),提升3D点云识别模型对抗攻击的鲁棒性和迁移性。 | geometric consistency | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 37 | Motion-Adapter: A Diffusion Model Adapter for Text-to-Motion Generation of Compound Actions | 提出Motion-Adapter,解决文本到复合动作生成中的动作覆盖和注意力崩溃问题 | motion diffusion model motion diffusion text-to-motion |