Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles

作者: Qiujing Lu, Xuanhan Wang, Yiwei Jiang, Guangming Zhao, Mingyue Ma, Shuo Feng

分类: cs.RO, cs.AI, cs.ET

发布日期: 2024-09-10

💡 一句话要点

OmniTester：基于多模态大语言模型的自动驾驶场景测试框架

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 自动驾驶 场景生成 大语言模型 多模态学习 交通仿真 检索增强生成 提示工程

📋 核心要点

现有自动驾驶测试方法难以兼顾多样性需求，泛化能力不足，导致场景生成效率和可用性降低。
OmniTester利用多模态大语言模型，结合提示工程、交通模拟工具和检索增强生成等技术，生成逼真多样场景。
实验证明OmniTester在生成复杂场景和重建事故场景方面具有可控性和有效性，验证了其泛化能力。

📝 摘要（中文）

在道路部署之前，生成极端场景对于高效测试自动驾驶车辆至关重要。然而，现有方法难以满足多样化的测试需求，并且常常缺乏对未见情况的泛化能力，从而降低了生成场景的便利性和可用性。为了解决这个问题，我们提出了OmniTester：一个基于多模态大语言模型（LLM）的框架，充分利用了LLM广泛的世界知识和推理能力。OmniTester旨在模拟环境中生成逼真且多样化的场景，为测试和评估自动驾驶汽车提供了一个强大的解决方案。除了提示工程之外，我们还使用了城市交通模拟工具来简化LLM生成的代码的复杂性。此外，我们还结合了检索增强生成和自我改进机制，以增强LLM对场景的理解，从而提高其生成更真实场景的能力。实验表明，我们的方法在生成三种具有挑战性和复杂性的场景中具有可控性和真实性。此外，我们还展示了其在崩溃报告中重建新场景的有效性，这得益于LLM的泛化能力。

🔬 方法详解

问题定义：现有自动驾驶测试场景生成方法难以满足多样化的测试需求，尤其是在生成极端和复杂场景时，泛化能力不足，无法应对未知的corner case。此外，现有方法在场景生成的可控性和易用性方面也存在局限性，难以根据特定需求定制测试场景。

核心思路：利用大型语言模型（LLM）强大的世界知识和推理能力，结合多模态信息输入，生成更逼真、多样且可控的自动驾驶测试场景。通过提示工程、外部工具和反馈机制，提升LLM对场景的理解和生成能力。

技术框架：OmniTester框架包含以下几个主要模块：1) 场景描述输入模块：接收用户提供的场景描述，可以是文本、图像或其他模态信息。2) LLM生成模块：利用LLM根据场景描述生成场景代码，例如SUMO的配置文件。3) 场景模拟模块：使用SUMO等交通模拟器运行生成的场景代码，生成模拟环境。4) 检索增强生成模块：从外部知识库检索相关信息，增强LLM对场景的理解。5) 自我改进模块：根据模拟结果和用户反馈，对LLM进行微调，提升场景生成质量。

关键创新：1) 首次将多模态LLM应用于自动驾驶测试场景生成，充分利用了LLM的知识和推理能力。2) 结合检索增强生成和自我改进机制，提升了LLM对场景的理解和生成质量。3) 利用SUMO等交通模拟工具，简化了LLM生成的代码的复杂性，提高了场景生成效率。

关键设计：1) 提示工程：设计合适的提示语，引导LLM生成符合要求的场景代码。2) 检索增强：构建包含交通规则、车辆行为等信息的知识库，供LLM检索。3) 自我改进：使用强化学习或监督学习方法，根据模拟结果和用户反馈，对LLM进行微调。

🖼️ 关键图片

📊 实验亮点

实验结果表明，OmniTester能够生成逼真且多样化的自动驾驶测试场景，包括复杂交通流、恶劣天气和突发事件等。通过与现有方法对比，OmniTester在场景生成质量、可控性和泛化能力方面均有显著提升。此外，OmniTester还能够根据事故报告重建事故场景，为事故分析和责任认定提供支持。

🎯 应用场景

OmniTester可应用于自动驾驶车辆的测试和验证，帮助开发者发现潜在的安全隐患，提高自动驾驶系统的可靠性和安全性。该框架还可用于生成各种极端和复杂场景，用于评估自动驾驶系统在不同环境下的性能。此外，OmniTester还可以用于自动驾驶算法的训练和优化，提高算法的泛化能力和鲁棒性。未来，该技术有望应用于智能交通管理、城市规划等领域。

📄 摘要（原文）

The generation of corner cases has become increasingly crucial for efficiently testing autonomous vehicles prior to road deployment. However, existing methods struggle to accommodate diverse testing requirements and often lack the ability to generalize to unseen situations, thereby reducing the convenience and usability of the generated scenarios. A method that facilitates easily controllable scenario generation for efficient autonomous vehicles (AV) testing with realistic and challenging situations is greatly needed. To address this, we proposed OmniTester: a multimodal Large Language Model (LLM) based framework that fully leverages the extensive world knowledge and reasoning capabilities of LLMs. OmniTester is designed to generate realistic and diverse scenarios within a simulation environment, offering a robust solution for testing and evaluating AVs. In addition to prompt engineering, we employ tools from Simulation of Urban Mobility to simplify the complexity of codes generated by LLMs. Furthermore, we incorporate Retrieval-Augmented Generation and a self-improvement mechanism to enhance the LLM's understanding of scenarios, thereby increasing its ability to produce more realistic scenes. In the experiments, we demonstrated the controllability and realism of our approaches in generating three types of challenging and complex scenarios. Additionally, we showcased its effectiveness in reconstructing new scenarios described in crash report, driven by the generalization capability of LLMs.

Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理