cs.CV(2026-01-19)
📊 共 29 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗2)
支柱一:机器人控制 (Robot Control) (4 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | A Generalist Foundation Model for Total-body PET/CT Enables Diagnostic Reporting and System-wide Metabolic Profiling | SDF-HOLO:用于全身PET/CT的通用基础模型,实现诊断报告和系统级代谢分析 | representation learning foundation model multimodal | ||
| 19 | CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning | 提出CausalSpatial基准测试,评估多模态大语言模型在因果空间推理中的能力 | world model large language model multimodal | ✅ | |
| 20 | Towards Unbiased Source-Free Object Detection via Vision Foundation Models | 提出DSOD框架,利用视觉基础模型解决无源域目标检测中的源域偏差问题 | distillation foundation model | ||
| 21 | ConvMambaNet: A Hybrid CNN-Mamba State Space Architecture for Accurate and Real-Time EEG Seizure Detection | ConvMambaNet:一种用于精确、实时脑电癫痫检测的混合CNN-Mamba状态空间架构 | Mamba SSM state space model | ||
| 22 | Think3D: Thinking with Space for Spatial Reasoning | Think3D:利用空间推理增强视觉大模型在空间理解上的能力 | reinforcement learning multimodal chain-of-thought | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting | CSGaussian:用于3D高斯溅射的渐进式率失真压缩与分割统一框架 | manipulation 3D gaussian splatting 3DGS | ||
| 24 | Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration | Spatial-VLN:利用显式空间感知和探索实现零样本视觉语言导航 | sim2real VLN large language model | ✅ | |
| 25 | TwoHead-SwinFPN: A Unified DL Architecture for Synthetic Manipulation, Detection and Localization in Identity Documents | TwoHead-SwinFPN:用于身份证件合成篡改检测与定位的统一深度学习架构 | manipulation | ||
| 26 | Exploring Talking Head Models With Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation | 提出THFEM框架,结合语音驱动头部生成模型与表情操控,提升唇形同步精度 | manipulation |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | A Semantic Decoupling-Based Two-Stage Rainy-Day Attack for Revealing Weather Robustness Deficiencies in Vision-Language Models | 提出基于语义解耦的两阶段雨天攻击框架,揭示视觉-语言模型的天气鲁棒性缺陷。 | physically plausible multimodal |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | Dual-Stream Collaborative Transformer for Image Captioning | 提出双流协同Transformer (DSCT) 用于解决图像描述生成中上下文信息不足的问题。 | mutual attention |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | Deep Learning for Semantic Segmentation of 3D Ultrasound Data | 提出基于3D U-Net的3D超声数据语义分割框架,用于恶劣环境下的自动驾驶。 | PULSE |