cs.CV(2025-10-06)
📊 共 28 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (11 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (8 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱六:视频提取与匹配 (Video Extraction) (2)
支柱五:交互与反应 (Interaction & Reaction) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Benchmark on Monocular Metric Depth Estimation in Wildlife Setting | 构建野生动物场景下单目深度估计基准,评估现有方法性能。 | MAE depth estimation monocular depth | ||
| 13 | Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models | 首个Video-LMM后训练综述:深入探讨基于大型多模态模型的视频推理 | reinforcement learning reward design spatiotemporal | ✅ | |
| 14 | Object-Centric Representation Learning for Enhanced 3D Scene Graph Prediction | 提出面向对象的表征学习方法,提升3D场景图预测精度 | representation learning open-vocabulary open vocabulary | ✅ | |
| 15 | Conditional Representation Learning for Customized Tasks | 提出条件表示学习(CRL),为定制任务提取特定语义的图像表征。 | representation learning large language model | ✅ | |
| 16 | A Comparative Study of Vision Transformers and CNNs for Few-Shot Rigid Transformation and Fundamental Matrix Estimation | 对比ViT与CNN在少样本刚性变换和本质矩阵估计中的性能差异 | contrastive learning scene reconstruction foundation model | ||
| 17 | ERDE: Entropy-Regularized Distillation for Early-exit | 提出ERDE:一种基于熵正则化的知识蒸馏早期退出方法,提升边缘设备图像分类效率。 | distillation | ||
| 18 | Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation | 提出AT-BPTT,通过自动内循环优化提升数据集蒸馏性能。 | distillation | ||
| 19 | EduPersona: Benchmarking Subjective Ability Boundaries of Virtual Student Agents | EduPersona:评估虚拟学生Agent主观能力的基准数据集与评测框架 | teacher-student large language model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction | 提出PG-Occ框架,通过渐进式高斯Transformer实现开放词汇三维 occupancy 预测。 | scene understanding open-vocabulary open vocabulary | ✅ | |
| 21 | Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning | 提出基于有界分布估计的开放词汇学习方法,通过生成未见类数据提升泛化能力 | open-vocabulary open vocabulary | ||
| 22 | See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models | 提出基于视觉语言模型的时序反演场景重建方法,利用热成像痕迹推断过去场景状态。 | scene reconstruction | ||
| 23 | AvatarVTON: 4D Virtual Try-On for Animatable Avatars | AvatarVTON:提出首个用于可动画Avatar的4D虚拟试穿框架 | optical flow |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks | 提出基于目标无关掩码的视觉目标条件强化学习方法,提升泛化性和效率 | sim-to-real reinforcement learning open-vocabulary | ||
| 25 | Hands-Free Heritage: Automated 3D Scanning for Cultural Heritage Digitization | 提出一种自动化双机器人扫描系统,用于文化遗产高精度三维数字化 | manipulation motion planning |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | Did you just see that? Arbitrary view synthesis for egocentric replay of operating room workflows from ambient sensors | EgoSurg:利用环境传感器,为手术室工作流程重建任意视角的自我中心回放。 | egocentric | ||
| 27 | SegMASt3R: Geometry Grounded Segment Matching | SegMASt3R:利用3D基础模型实现几何感知的图像分割匹配 | feature matching foundation model |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | Read the Room: Inferring Social Context Through Dyadic Interaction Recognition in Cyber-physical-social Infrastructure Systems | 在人机社会基础设施中,通过双人互动识别推断社会情境 | dyadic interaction |