cs.CV(2025-10-05)

📊 共 8 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (2) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
1 \textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding GUI-Spotlight:自适应迭代聚焦优化,增强GUI视觉定位 large language model multimodal visual grounding
2 Automating construction safety inspections using a multi-modal vision-language RAG framework 提出SiteShield,利用多模态RAG框架自动化建筑安全检查报告生成。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
3 Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs 提出概念对齐的自主蒸馏方法,解决多漂移MLLM的知识蒸馏问题 distillation large language model multimodal
4 MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator MorphoSim:一种可交互、可控、可编辑的语言引导4D世界模拟器 world model spatiotemporal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
5 Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation 提出基于姿态回归和去噪扩散联合学习的类别级6D姿态估计方法 6D pose estimation
6 Learning Efficient Meshflow and Optical Flow from Event Cameras 提出EEMFlow网络,解决事件相机Meshflow和光流高效估计问题,并构建高分辨率数据集HREM。 optical flow

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
7 RAP: 3D Rasterization Augmented End-to-End Planning 提出RAP:基于光栅化的端到端规划,提升驾驶策略的闭环鲁棒性和长尾泛化能力。 sim-to-real imitation learning

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
8 MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation MetaFind:提出场景感知的3D资产检索框架,用于生成一致的元宇宙场景 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页