cs.CV(2026-01-27)

📊 共 21 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (6 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
1 EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning EgoHandICL:利用上下文学习进行第一人称视角3D手部重建 masked autoencoder MAE egocentric
2 Innovator-VL: A Multimodal Large Language Model for Scientific Discovery 提出 Innovator-VL,一种用于科学发现的多模态大语言模型 reinforcement learning large language model multimodal
3 Video-KTR: Reinforcing Video Reasoning via Key Token Attribution 提出Video-KTR以解决视频推理中的奖励稀疏问题 reinforcement learning large language model multimodal
4 Towards Pixel-Level VLM Perception via Simple Points Prediction SimpleSeg:通过简单点预测实现像素级视觉语言模型感知 reinforcement learning large language model multimodal
5 m2sv: A Scalable Benchmark for Map-to-Street-View Spatial Reasoning 提出m2sv基准测试,用于评估视觉-语言模型在地图到街景空间推理中的能力。 reinforcement learning egocentric multimodal
6 DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation DSVM-UNet:通过双重自蒸馏增强VM-UNet,用于医学图像分割 Mamba distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
7 Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues 提出MMPCBench基准,评估多模态大语言模型在电商产品目录中缺失模态补全的能力。 large language model multimodal
8 DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding 提出DuwatBench:一个用于多模态理解的阿拉伯书法基准数据集。 multimodal
9 Towards Governance-Oriented Low-Altitude Intelligence: A Management-Centric Multi-Modal Benchmark With Implicitly Coordinated Vision-Language Reasoning Framework 提出GovLA-10K和GovLA-Reasoner,用于城市治理的低空智能多模态基准与推理框架。 large language model visual grounding
10 UniPCB: A Unified Vision-Language Benchmark for Open-Ended PCB Quality Inspection UniPCB:用于开放式PCB质量检测的统一视觉-语言基准 large language model multimodal
11 Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision Youtu-VL:通过统一的视觉-语言监督释放视觉潜力 multimodal
12 Reg-TTR, Test-Time Refinement for Fast, Robust and Accurate Image Registration 提出Reg-TTR,通过测试时优化提升图像配准模型的速度、鲁棒性和精度。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
13 Fast Converging 3D Gaussian Splatting for 1-Minute Reconstruction 提出快速收敛的3D高斯溅射重建方法,实现1分钟内重建 monocular depth 3D gaussian splatting 3DGS
14 WaterClear-GS: Optical-Aware Gaussian Splatting for Underwater Reconstruction and Restoration WaterClear-GS:基于光衰减和散射的水下高斯溅射重建与复原 3D gaussian splatting 3DGS gaussian splatting
15 VGGT-SLAM 2.0: Real time Dense Feed-forward Scene Reconstruction VGGT-SLAM 2.0:实时稠密前馈场景重建,提升精度与效率 scene reconstruction VGGT
16 Towards Gold-Standard Depth Estimation for Tree Branches in UAV Forestry: Benchmarking Deep Stereo Matching Methods 针对无人机林业中树枝深度估计,提出基于深度立体匹配的基准测试方案。 depth estimation scene flow foundation model
17 TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment TIGaussian:解耦高斯分布以实现空间感知的文本-图像-3D对齐 3D gaussian splatting 3DGS gaussian splatting

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
18 Dynamic Worlds, Dynamic Humans: Generating Virtual Human-Scene Interaction Motion in Dynamic Scenes 提出Dyn-HSI,解决动态场景中虚拟人与场景交互运动生成问题 humanoid world model human-scene interaction
19 Instance-Guided Radar Depth Estimation for 3D Object Detection 提出InstaRadar,通过实例分割引导的雷达深度估计,提升单目3D目标检测性能。 motion planning depth estimation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
20 QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture QuaMo:利用四元数运动学捕获视觉三维人体运动,解决欧拉角不连续问题。 human motion

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
21 Magnetic Resonance Simulation of Effective Transverse Relaxation (T2*) 提出高效模拟横向弛豫时间T2*的新方法 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页