cs.CV(2025-04-16)

📊 共 7 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (3) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱二:RL算法与架构 (RL & Architecture) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
1 WorldMem: Long-term Consistent World Simulation with Memory WorldMem:利用记忆机制实现长期一致的世界模拟 TAMP
2 The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation 提出RAPO框架,通过检索增强提示优化提升文本到视频生成质量 large language model
3 Interpreting the linear structure of vision-language model embedding spaces 利用稀疏自编码器解析视觉-语言模型嵌入空间的线性结构 multimodal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
4 Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space 提出事件质量评分(EQS),用于评估模拟事件相机数据与真实数据的逼真度。 sim-to-real
5 DG-MVP: 3D Domain Generalization via Multiple Views of Point Clouds for Classification DG-MVP:通过点云多视角投影实现3D领域泛化分类 sim-to-real

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
6 How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions 提出基于交互轨迹预测的3D手部动作合成方法,用于日常交互场景。 affordance VQ-VAE

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
7 AdaVid: Adaptive Video-Language Pretraining AdaVid:自适应视频语言预训练,提升边缘设备视频编码效率 representation learning Ego4D

⬅️ 返回 cs.CV 首页 · 🏠 返回主页