cs.CV(2026-01-27)
📊 共 21 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (6 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning | EgoHandICL:利用上下文学习进行第一人称视角3D手部重建 | masked autoencoder MAE egocentric | ✅ | |
| 2 | Innovator-VL: A Multimodal Large Language Model for Scientific Discovery | 提出 Innovator-VL,一种用于科学发现的多模态大语言模型 | reinforcement learning large language model multimodal | ||
| 3 | Video-KTR: Reinforcing Video Reasoning via Key Token Attribution | 提出Video-KTR以解决视频推理中的奖励稀疏问题 | reinforcement learning large language model multimodal | ✅ | |
| 4 | Towards Pixel-Level VLM Perception via Simple Points Prediction | SimpleSeg:通过简单点预测实现像素级视觉语言模型感知 | reinforcement learning large language model multimodal | ||
| 5 | m2sv: A Scalable Benchmark for Map-to-Street-View Spatial Reasoning | 提出m2sv基准测试,用于评估视觉-语言模型在地图到街景空间推理中的能力。 | reinforcement learning egocentric multimodal | ||
| 6 | DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation | DSVM-UNet:通过双重自蒸馏增强VM-UNet,用于医学图像分割 | Mamba distillation | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Fast Converging 3D Gaussian Splatting for 1-Minute Reconstruction | 提出快速收敛的3D高斯溅射重建方法,实现1分钟内重建 | monocular depth 3D gaussian splatting 3DGS | ||
| 14 | WaterClear-GS: Optical-Aware Gaussian Splatting for Underwater Reconstruction and Restoration | WaterClear-GS:基于光衰减和散射的水下高斯溅射重建与复原 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 15 | VGGT-SLAM 2.0: Real time Dense Feed-forward Scene Reconstruction | VGGT-SLAM 2.0:实时稠密前馈场景重建,提升精度与效率 | scene reconstruction VGGT | ||
| 16 | Towards Gold-Standard Depth Estimation for Tree Branches in UAV Forestry: Benchmarking Deep Stereo Matching Methods | 针对无人机林业中树枝深度估计,提出基于深度立体匹配的基准测试方案。 | depth estimation scene flow foundation model | ||
| 17 | TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment | TIGaussian:解耦高斯分布以实现空间感知的文本-图像-3D对齐 | 3D gaussian splatting 3DGS gaussian splatting |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Dynamic Worlds, Dynamic Humans: Generating Virtual Human-Scene Interaction Motion in Dynamic Scenes | 提出Dyn-HSI,解决动态场景中虚拟人与场景交互运动生成问题 | humanoid world model human-scene interaction | ||
| 19 | Instance-Guided Radar Depth Estimation for 3D Object Detection | 提出InstaRadar,通过实例分割引导的雷达深度估计,提升单目3D目标检测性能。 | motion planning depth estimation |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture | QuaMo:利用四元数运动学捕获视觉三维人体运动,解决欧拉角不连续问题。 | human motion | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Magnetic Resonance Simulation of Effective Transverse Relaxation (T2*) | 提出高效模拟横向弛豫时间T2*的新方法 | PULSE |