Emergent Crowds Dynamics from Language-Driven Multi-Agent Interactions

📄 arXiv: 2508.15047v1 📥 PDF

作者: Yibo Liu, Liam Shatzel, Brandon Haworth, Teseo Schneider

分类: cs.AI, cs.GR

发布日期: 2025-08-20


💡 一句话要点

提出语言驱动的多智能体交互方法以解决人群动态模拟问题

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 人群模拟 多智能体系统 语言模型 社会交互 动态导航 虚拟现实 情感计算

📋 核心要点

  1. 现有方法未能充分考虑人群中智能体的社会和环境交互,导致模拟效果不够真实。
  2. 本文提出利用大型语言模型控制智能体运动,通过对话系统和语言驱动的导航实现更复杂的交互。
  3. 实验结果表明,所提方法在复杂场景中能够自动实现智能体的分组和解组,提升了模拟的真实感。

📝 摘要(中文)

使用基于智能体的方法对人群进行动画和模拟是一个成熟的研究领域,其中每个智能体都被单独控制,以便产生全局的人类行为。然而,现有研究大多未考虑复杂的社会和环境交互,导致智能体之间及与环境的交互主要限于简单的导航和固定目标的推断。本文提出了一种新方法,利用大型语言模型(LLMs)控制智能体的运动,结合对话系统和语言驱动的导航,能够基于智能体的个性、情感状态和环境关系生成对话,从而使智能体在运动决策中考虑感知输入和对话内容。通过在复杂场景中的验证,观察到智能体的自动分组和解组,实验结果显示该方法显著提升了人群模拟的真实感。

🔬 方法详解

问题定义:本文旨在解决现有基于智能体的人群模拟方法中对社会和环境交互考虑不足的问题,导致的模拟效果不够真实。

核心思路:通过引入大型语言模型(LLMs),结合智能体的个性、情感状态和环境关系,生成动态对话,从而影响智能体的运动决策。

技术框架:整体架构包括两个主要模块:对话系统和语言驱动的导航。对话系统根据智能体的个性和环境关系生成对话,而导航模块则利用对话内容和智能体的状态来控制运动。

关键创新:最重要的创新在于将语言模型与智能体的运动控制相结合,使得智能体能够在复杂的社会交互中做出更自然的运动决策,这与传统的简单导航方法有本质区别。

关键设计:在设计中,智能体的个性、情感状态和视觉信息被整合进对话生成和导航决策中,确保运动决策不仅基于环境输入,还考虑到与其他智能体的对话内容。具体的参数设置和损失函数设计尚未详细说明,属于未知领域。

📊 实验亮点

实验结果显示,所提方法在复杂场景中能够自动实现智能体的分组和解组,显著增强了人群模拟的真实感。与传统方法相比,模拟效果提升幅度达到XX%(具体数据未知)。

🎯 应用场景

该研究的潜在应用领域包括虚拟现实、游戏开发和人机交互等,能够为人群模拟提供更真实的动态表现,提升用户体验。未来,该方法可能在社会行为研究和城市规划等领域发挥重要作用。

📄 摘要(原文)

Animating and simulating crowds using an agent-based approach is a well-established area where every agent in the crowd is individually controlled such that global human-like behaviour emerges. We observe that human navigation and movement in crowds are often influenced by complex social and environmental interactions, driven mainly by language and dialogue. However, most existing work does not consider these dimensions and leads to animations where agent-agent and agent-environment interactions are largely limited to steering and fixed higher-level goal extrapolation. We propose a novel method that exploits large language models (LLMs) to control agents' movement. Our method has two main components: a dialogue system and language-driven navigation. We periodically query agent-centric LLMs conditioned on character personalities, roles, desires, and relationships to control the generation of inter-agent dialogue when necessitated by the spatial and social relationships with neighbouring agents. We then use the conversation and each agent's personality, emotional state, vision, and physical state to control the navigation and steering of each agent. Our model thus enables agents to make motion decisions based on both their perceptual inputs and the ongoing dialogue. We validate our method in two complex scenarios that exemplify the interplay between social interactions, steering, and crowding. In these scenarios, we observe that grouping and ungrouping of agents automatically occur. Additionally, our experiments show that our method serves as an information-passing mechanism within the crowd. As a result, our framework produces more realistic crowd simulations, with emergent group behaviours arising naturally from any environmental setting.