cs.CV(2025-11-03)

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱一:机器人控制 (Robot Control) (8 🔗1) 支柱三:空间感知 (Perception & SLAM) (5) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱一:机器人控制 (Robot Control) (8 篇)

#题目一句话要点标签🔗
1 SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation 提出SE(3)-PoseFlow,用于估计6D位姿分布,实现不确定性感知的机器人操作 manipulation grasp flow matching
2 OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation OmniVLA:面向机器人操作的物理 grounding 多模态 VLA 模型,统一多传感器感知 manipulation
3 TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning 提出TIR-Bench,用于评估Agentic图像推理中模型利用工具进行图像处理的能力 manipulation localization
4 PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model PixelVLA:通过像素级理解和多模态提示,提升视觉-语言-动作模型的性能 manipulation scene understanding
5 Web-Scale Collection of Video Data for 4D Animal Reconstruction 提出AiM数据集与基线方法,用于野生环境下的动物4D重建 quadruped pose estimation
6 EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning 提出EVLP,通过强化监督微调学习统一具身视觉-语言规划器,解决长程操作任务中的多模态规划问题。 manipulation
7 A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model 提出基于对比语言-图像预训练模型的生成对抗攻击方法,提升攻击效果与视觉保真度。 manipulation
8 Source-Only Cross-Weather LiDAR via Geometry-Aware Point Drop 提出几何感知点丢弃适配器,提升LiDAR在恶劣天气下的语义分割性能。 domain randomization

🔬 支柱三:空间感知 (Perception & SLAM) (5 篇)

#题目一句话要点标签🔗
9 Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning 提出DiMoDE框架,通过区分运动分量提升深度和自运动联合学习效果 optical flow ego-motion
10 HGFreNet: Hop-hybrid GraphFomer for 3D Human Pose Estimation with Trajectory Consistency in Frequency Domain 提出HGFreNet,利用Hop-hybrid GraphFomer解决单目视频3D人体姿态估计中的轨迹不一致问题。 pose estimation
11 Opto-Electronic Convolutional Neural Network Design Via Direct Kernel Optimization 提出光电卷积神经网络两阶段设计,通过直接核优化提升单目深度估计精度。 depth estimation monocular depth
12 Semantic BIM enrichment for firefighting assets: Fire-ART dataset and panoramic image-based 3D reconstruction 提出Fire-ART数据集,并设计基于全景图像的3D重建方法,用于消防资产的BIM语义增强。 localization
13 Eyes on Target: Gaze-Aware Object Detection in Egocentric Video Eyes on Target:提出深度感知和注视引导的目标检测框架,用于以自我为中心的视频分析。 ego-motion

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
14 MVSMamba: Multi-View Stereo with State Space Model MVSMamba:利用状态空间模型实现高效多视角立体视觉重建 Mamba state space model feature matching
15 Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models Actial:通过视角学习激活多模态大语言模型的空间推理能力 reinforcement learning scene understanding
16 DINO-MX: A Modular & Flexible Framework for Self-Supervised Learning DINO-MX:一个模块化自监督学习框架,降低计算成本并提升灵活性。 representation learning localization

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
17 MoSa: Motion Generation with Scalable Autoregressive Modeling MoSa:基于可扩展自回归建模的运动生成框架,提升文本驱动3D人体运动生成质量与效率。 motion generation MoMask

⬅️ 返回 cs.CV 首页 · 🏠 返回主页