Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer

📄 arXiv: 2510.03342v3 📥 PDF

作者: Gemini Robotics Team, Abbas Abdolmaleki, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Ashwin Balakrishna, Nathan Batchelor, Alex Bewley, Jeff Bingham, Michael Bloesch, Konstantinos Bousmalis, Philemon Brakel, Anthony Brohan, Thomas Buschmann, Arunkumar Byravan, Serkan Cabi, Ken Caluwaerts, Federico Casarini, Christine Chan, Oscar Chang, London Chappellet-Volpini, Jose Enrique Chen, Xi Chen, Hao-Tien Lewis Chiang, Krzysztof Choromanski, Adrian Collister, David B. D'Ambrosio, Sudeep Dasari, Todor Davchev, Meet Kirankumar Dave, Coline Devin, Norman Di Palo, Tianli Ding, Carl Doersch, Adil Dostmohamed, Yilun Du, Debidatta Dwibedi, Sathish Thoppay Egambaram, Michael Elabd, Tom Erez, Xiaolin Fang, Claudio Fantacci, Cody Fong, Erik Frey, Chuyuan Fu, Ruiqi Gao, Marissa Giustina, Keerthana Gopalakrishnan, Laura Graesser, Oliver Groth, Agrim Gupta, Roland Hafner, Steven Hansen, Leonard Hasenclever, Sam Haves, Nicolas Heess, Brandon Hernaez, Alex Hofer, Jasmine Hsu, Lu Huang, Sandy H. Huang, Atil Iscen, Mithun George Jacob, Deepali Jain, Sally Jesmonth, Abhishek Jindal, Ryan Julian, Dmitry Kalashnikov, M. Emre Karagozler, Stefani Karp, Matija Kecman, J. Chase Kew, Donnie Kim, Frank Kim, Junkyung Kim, Thomas Kipf, Sean Kirmani, Ksenia Konyushkova, Li Yang Ku, Yuheng Kuang, Thomas Lampe, Antoine Laurens, Tuan Anh Le, Isabel Leal, Alex X. Lee, Tsang-Wei Edward Lee, Guy Lever, Jacky Liang, Li-Heng Lin, Fangchen Liu, Shangbang Long, Caden Lu, Sharath Maddineni, Anirudha Majumdar, Kevis-Kokitsi Maninis, Andrew Marmon, Sergio Martinez, Assaf Hurwitz Michaely, Niko Milonopoulos, Joss Moore, Robert Moreno, Michael Neunert, Francesco Nori, Joy Ortiz, Kenneth Oslund, Carolina Parada, Emilio Parisotto, Amaris Paryag, Acorn Pooley, Thomas Power, Alessio Quaglino, Haroon Qureshi, Rajkumar Vasudeva Raju, Helen Ran, Dushyant Rao, Kanishka Rao, Isaac Reid, David Rendleman, Krista Reymann, Miguel Rivas, Francesco Romano, Yulia Rubanova, Peter Pastor Sampedro, Pannag R Sanketi, Dhruv Shah, Mohit Sharma, Kathryn Shea, Mohit Shridhar, Charles Shu, Vikas Sindhwani, Sumeet Singh, Radu Soricut, Rachel Sterneck, Ian Storz, Razvan Surdulescu, Jie Tan, Jonathan Tompson, Saran Tunyasuvunakool, Jake Varley, Grace Vesom, Giulia Vezzani, Maria Bauza Villalonga, Oriol Vinyals, René Wagner, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Chengda Wu, Markus Wulfmeier, Fei Xia, Ted Xiao, Annie Xie, Jinyu Xie, Peng Xu, Sichun Xu, Ying Xu, Zhuo Xu, Jimmy Yan, Sherry Yang, Skye Yang, Yuxiang Yang, Hiu Hong Yu, Wenhao Yu, Wentao Yuan, Yuan Yuan, Jingwei Zhang, Tingnan Zhang, Zhiyuan Zhang, Allan Zhou, Guangyao Zhou, Yuxiang Zhou

分类: cs.RO

发布日期: 2025-10-02 (更新: 2025-11-28)


💡 一句话要点

Gemini Robotics 1.5:通过具身推理、思考和运动迁移推进通用机器人前沿

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 通用机器人 具身推理 运动迁移 视觉语言动作模型 多具身学习

📋 核心要点

  1. 现有通用机器人缺乏对物理世界的深入理解、高级推理能力以及通用灵巧的控制能力。
  2. Gemini Robotics 1.5通过新颖的架构和运动迁移机制,从异构多具身机器人数据中学习,提升VLA模型的通用性。
  3. Gemini Robotics-ER 1.5在具身推理方面达到新的高度,提升了机器人在视觉空间理解、任务规划和进度估计等方面的能力。

📝 摘要(中文)

本报告介绍了Gemini Robotics模型家族的最新一代:Gemini Robotics 1.5,一个多具身视觉-语言-动作(VLA)模型,以及Gemini Robotics-ER 1.5,一个最先进的具身推理(ER)模型。我们汇集了三个主要创新。首先,Gemini Robotics 1.5采用了一种新颖的架构和运动迁移(MT)机制,使其能够从异构、多具身机器人数据中学习,并使VLA更具通用性。其次,Gemini Robotics 1.5将动作与自然语言中的多层次内部推理过程交织在一起。这使得机器人能够“三思而后行”,并显著提高其分解和执行复杂、多步骤任务的能力,同时也使机器人的行为对用户更具可解释性。第三,Gemini Robotics-ER 1.5为具身推理建立了一个新的最先进水平,即对于机器人至关重要的推理能力,如视觉和空间理解、任务规划和进度估计。总之,这个模型家族使我们朝着物理智能体时代迈进了一步——使机器人能够感知、思考然后行动,从而解决复杂的多步骤任务。

🔬 方法详解

问题定义:现有通用机器人难以胜任复杂的多步骤任务,主要痛点在于缺乏对物理世界的深刻理解、高级推理能力以及通用且灵巧的控制能力。现有方法难以有效利用异构机器人数据,且推理能力不足,导致任务分解和执行效果不佳。

核心思路:Gemini Robotics 1.5的核心思路是构建一个多具身视觉-语言-动作(VLA)模型,并引入运动迁移机制,使其能够从异构机器人数据中学习。同时,通过将动作与多层次内部推理过程交织,提升机器人的“三思而后行”能力,从而更好地分解和执行复杂任务。Gemini Robotics-ER 1.5则专注于提升具身推理能力,包括视觉和空间理解、任务规划和进度估计。

技术框架:Gemini Robotics 1.5采用了一种新颖的架构,具体细节未知。其主要流程包括:首先,通过视觉和语言输入感知环境;然后,进行多层次的内部推理,生成行动计划;最后,执行相应的动作。运动迁移机制用于将从不同机器人上学习到的知识迁移到新的机器人上。Gemini Robotics-ER 1.5的具体架构未知,但其目标是提供更强大的具身推理能力。

关键创新:该论文的关键创新在于:1) 提出了运动迁移(MT)机制,使得模型能够从异构、多具身机器人数据中学习,从而提升了VLA模型的通用性。2) 将动作与多层次内部推理过程交织,使得机器人能够“三思而后行”,从而更好地分解和执行复杂任务。3) Gemini Robotics-ER 1.5在具身推理方面达到了新的state-of-the-art。

关键设计:论文中关于架构和具体实现细节的描述较少,例如新颖架构的具体形式、运动迁移机制的实现方式、多层次内部推理过程的细节、损失函数的设计等,这些关键设计目前未知。

🖼️ 关键图片

img_0

📊 实验亮点

论文摘要中提到Gemini Robotics-ER 1.5在具身推理方面达到了新的state-of-the-art,但具体的性能数据、对比基线以及提升幅度等信息未知。Gemini Robotics 1.5通过运动迁移机制和多层次内部推理,提升了复杂任务的分解和执行能力,但具体的实验结果和性能指标未知。

🎯 应用场景

该研究成果可广泛应用于各种机器人应用场景,例如家庭服务机器人、工业自动化机器人、医疗辅助机器人等。通过提升机器人的感知、推理和行动能力,可以使其更好地完成各种复杂任务,提高工作效率和服务质量。未来,该技术有望推动机器人技术的发展,实现更智能、更通用的机器人。

📄 摘要(原文)

General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control. This report introduces the latest generation of the Gemini Robotics model family: Gemini Robotics 1.5, a multi-embodiment Vision-Language-Action (VLA) model, and Gemini Robotics-ER 1.5, a state-of-the-art Embodied Reasoning (ER) model. We are bringing together three major innovations. First, Gemini Robotics 1.5 features a novel architecture and a Motion Transfer (MT) mechanism, which enables it to learn from heterogeneous, multi-embodiment robot data and makes the VLA more general. Second, Gemini Robotics 1.5 interleaves actions with a multi-level internal reasoning process in natural language. This enables the robot to "think before acting" and notably improves its ability to decompose and execute complex, multi-step tasks, and also makes the robot's behavior more interpretable to the user. Third, Gemini Robotics-ER 1.5 establishes a new state-of-the-art for embodied reasoning, i.e., for reasoning capabilities that are critical for robots, such as visual and spatial understanding, task planning, and progress estimation. Together, this family of models takes us a step towards an era of physical agents-enabling robots to perceive, think and then act so they can solve complex multi-step tasks.