LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs

作者: Bing Hao, Minglai Shao, Zengyi Wo, Yunlong Chu, Yuhang Liu, Ruijie Wang

分类: cs.LG, cs.AI

发布日期: 2025-12-24

💡 一句话要点

LLMTM：基准测试并优化LLM在动态图时间motif分析中的应用

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 动态图 时间motif分析 基准测试 结构感知调度

📋 核心要点

现有方法难以有效利用LLM进行动态图的时间motif分析，缺乏系统性的基准测试和优化。
提出LLMTM基准，并设计工具增强的LLM代理，通过精确提示解决时间motif分析任务。
引入结构感知调度器，根据图结构和LLM认知负荷智能调度查询，在精度和成本间取得平衡。

📝 摘要（中文）

大型语言模型（LLM）的广泛应用激发了人们对其处理动态图能力的日益增长的兴趣。时间motif作为动态图的基本单元和重要的局部属性，可以直接反映异常和独特的现象，对于理解其演化动态和结构特征至关重要。然而，利用LLM进行动态图上的时间motif分析仍相对未被探索。本文系统地研究了LLM在时间motif相关任务上的性能。具体来说，我们提出了一个全面的基准测试LLMTM（时间Motif中的大型语言模型），其中包括跨越九种时间motif类型的六个定制任务。然后，我们进行了广泛的实验，以分析不同提示技术和LLM（包括openPangu-7B、DeepSeek-R1-Distill-Qwen系列、Qwen2.5-32B-Instruct、GPT-4o-mini、DeepSeek-R1和o3等九个模型）对模型性能的影响。根据我们的基准测试结果，我们开发了一个工具增强的LLM代理，该代理利用精确设计的提示来高精度地解决这些任务。然而，代理的高精度带来了巨大的成本。为了解决这种权衡，我们提出了一种简单而有效的结构感知调度器，该调度器同时考虑了动态图的结构属性和LLM的认知负荷，以在标准LLM提示和更强大的代理之间智能地调度查询。我们的实验表明，结构感知调度器有效地保持了高精度，同时降低了成本。

🔬 方法详解

问题定义：论文旨在解决如何有效利用大型语言模型（LLM）进行动态图中的时间motif分析的问题。现有方法缺乏针对时间motif分析的系统性基准测试，并且直接使用LLM进行此类分析的效率和准确性存在挑战。此外，高精度的方法往往伴随着高昂的计算成本。

核心思路：论文的核心思路是首先构建一个全面的基准测试集LLMTM，用于评估LLM在时间motif分析任务上的性能。然后，通过设计工具增强的LLM代理和结构感知调度器，优化LLM在时间motif分析中的应用，实现在精度和成本之间的平衡。这样设计的目的是为了充分发挥LLM的推理能力，同时降低计算资源消耗。

技术框架：整体框架包含三个主要部分：1) LLMTM基准测试集的构建，用于评估不同LLM在时间motif分析任务上的性能；2) 工具增强的LLM代理，通过精确设计的提示来提高LLM在时间motif分析任务上的准确性；3) 结构感知调度器，根据动态图的结构属性和LLM的认知负荷，智能地在标准LLM提示和工具增强的LLM代理之间调度查询。

关键创新：论文的关键创新在于提出了结构感知调度器。该调度器能够根据动态图的结构特征和LLM的认知负荷，动态地选择使用标准LLM提示或工具增强的LLM代理。这种自适应的调度策略能够在保证高精度的前提下，显著降低计算成本。与直接使用LLM或仅使用工具增强的LLM代理相比，结构感知调度器能够更好地平衡精度和效率。

关键设计：结构感知调度器的关键设计在于如何定义和衡量动态图的结构属性和LLM的认知负荷。具体的参数设置和算法细节在论文中未详细说明，属于未知信息。但是，可以推测，结构属性可能包括图的密度、节点度分布等，而认知负荷可能与查询的复杂度和LLM的推理步骤有关。

🖼️ 关键图片

📊 实验亮点

实验结果表明，提出的结构感知调度器能够在保持高精度的前提下，有效降低计算成本。具体性能数据和对比基线在摘要中未明确给出，属于未知信息。但论文强调，该调度器在精度和效率之间取得了良好的平衡，优于直接使用LLM或仅使用工具增强的LLM代理。

🎯 应用场景

该研究成果可应用于社交网络分析、金融欺诈检测、生物网络分析等领域。通过利用LLM对动态图中的时间motif进行分析，可以更有效地识别异常行为、预测事件发展趋势，并深入理解复杂系统的演化规律。该研究为利用LLM解决实际动态图分析问题提供了新的思路和方法。

📄 摘要（原文）

The widespread application of Large Language Models (LLMs) has motivated a growing interest in their capacity for processing dynamic graphs. Temporal motifs, as an elementary unit and important local property of dynamic graphs which can directly reflect anomalies and unique phenomena, are essential for understanding their evolutionary dynamics and structural features. However, leveraging LLMs for temporal motif analysis on dynamic graphs remains relatively unexplored. In this paper, we systematically study LLM performance on temporal motif-related tasks. Specifically, we propose a comprehensive benchmark, LLMTM (Large Language Models in Temporal Motifs), which includes six tailored tasks across nine temporal motif types. We then conduct extensive experiments to analyze the impacts of different prompting techniques and LLMs (including nine models: openPangu-7B, the DeepSeek-R1-Distill-Qwen series, Qwen2.5-32B-Instruct, GPT-4o-mini, DeepSeek-R1, and o3) on model performance. Informed by our benchmark findings, we develop a tool-augmented LLM agent that leverages precisely engineered prompts to solve these tasks with high accuracy. Nevertheless, the high accuracy of the agent incurs a substantial cost. To address this trade-off, we propose a simple yet effective structure-aware dispatcher that considers both the dynamic graph's structural properties and the LLM's cognitive load to intelligently dispatch queries between the standard LLM prompting and the more powerful agent. Our experiments demonstrate that the structure-aware dispatcher effectively maintains high accuracy while reducing cost.

LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理