cs.CV（2025-10-05）

📊 共 8 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (2) 支柱二：RL算法与架构 (RL & Architecture) (2 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱一：机器人控制 (Robot Control) (1 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
1	\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding	GUI-Spotlight：自适应迭代聚焦优化，增强GUI视觉定位	large language model multimodal visual grounding
2	Automating construction safety inspections using a multi-modal vision-language RAG framework	提出SiteShield，利用多模态RAG框架自动化建筑安全检查报告生成。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
3	Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs	提出概念对齐的自主蒸馏方法，解决多漂移MLLM的知识蒸馏问题	distillation large language model multimodal
4	MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator	MorphoSim：一种可交互、可控、可编辑的语言引导4D世界模拟器	world model spatiotemporal	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
5	Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation	提出基于姿态回归和去噪扩散联合学习的类别级6D姿态估计方法	6D pose estimation
6	Learning Efficient Meshflow and Optical Flow from Event Cameras	提出EEMFlow网络，解决事件相机Meshflow和光流高效估计问题，并构建高分辨率数据集HREM。	optical flow	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
7	RAP: 3D Rasterization Augmented End-to-End Planning	提出RAP：基于光栅化的端到端规划，提升驾驶策略的闭环鲁棒性和长尾泛化能力。	sim-to-real imitation learning	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
8	MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation	MetaFind：提出场景感知的3D资产检索框架，用于生成一致的元宇宙场景	spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页