Large language models as oracles for instantiating ontologies with domain-specific knowledge

📄 arXiv: 2404.04108v2 📥 PDF

作者: Giovanni Ciatto, Andrea Agiollo, Matteo Magnini, Andrea Omicini

分类: cs.AI, cs.CL, cs.IR, cs.LG, cs.LO

发布日期: 2024-04-05 (更新: 2024-12-12)

期刊: Knowledge-Based Systems 310 (2025) 112940

DOI: 10.1016/j.knosys.2024.112940


💡 一句话要点

提出一种新方法利用大型语言模型自动实例化领域本体

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 本体实例化 大型语言模型 领域特定知识 自动化 智能系统 营养领域 知识管理

📋 核心要点

  1. 核心问题:现有方法依赖人工专家手动设计本体,过程耗时且容易受到设计者个人背景的偏见影响。
  2. 方法要点:提出一种利用大型语言模型自动生成领域特定知识实例的方法,减少人工干预,提高效率。
  3. 实验或效果:在营养领域的实验中,所提方法的质量指标比现有技术高出五倍,错误实体和关系减少十倍。

📝 摘要(中文)

背景:为智能系统赋予语义数据通常需要设计和实例化领域特定知识的本体,现有方法依赖人工专家,过程耗时且易出错。目标:为解决这一问题,本文提出了一种新颖的领域无关方法,通过利用大型语言模型(LLMs)作为神谕,自动实例化本体。方法:基于初始模式和查询模板,本文的方法多次查询LLM,从其回复中生成类和属性的实例,从而快速自动丰富本体。贡献:我们在多个LLM上形式化了该方法,并在营养领域的案例研究中进行了实证,结果显示该方法的质量指标比现有技术高出五倍,同时错误实体和关系减少了十倍。最后,提供了该方法的SWOT分析。

🔬 方法详解

问题定义:本文旨在解决智能系统中本体实例化的效率低下和错误率高的问题。现有方法依赖人工专家,导致过程耗时且容易受到个人偏见的影响。

核心思路:通过利用大型语言模型(LLMs)作为神谕,自动生成领域特定知识的实例,从而减少人工干预,提高本体实例化的效率和准确性。

技术框架:整体流程包括两个主要模块:首先,构建一个包含相关类和属性的初始模式;其次,使用一组查询模板多次查询LLM,从其回复中生成类和属性的实例,最终自动填充本体。

关键创新:最重要的创新在于将大型语言模型作为自动化工具,显著提高了本体实例化的质量和效率,与传统人工方法形成鲜明对比。

关键设计:在方法实施中,关键参数包括初始模式的设计和查询模板的构建,确保生成的实例符合领域特定知识的要求。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,所提方法在营养领域的本体实例化质量指标比现有技术高出五倍,同时错误实体和关系的数量减少了十倍,展现出显著的性能提升。

🎯 应用场景

该研究的潜在应用领域包括智能系统、知识管理和语义网络等。通过自动化本体实例化,可以显著提高领域知识的获取效率,降低人工成本,推动智能系统在各个行业的应用与发展。

📄 摘要(原文)

Background. Endowing intelligent systems with semantic data commonly requires designing and instantiating ontologies with domain-specific knowledge. Especially in the early phases, those activities are typically performed manually by human experts possibly leveraging on their own experience. The resulting process is therefore time-consuming, error-prone, and often biased by the personal background of the ontology designer. Objective. To mitigate that issue, we propose a novel domain-independent approach to automatically instantiate ontologies with domain-specific knowledge, by leveraging on large language models (LLMs) as oracles. Method. Starting from (i) an initial schema composed by inter-related classes and properties and (ii) a set of query templates, our method queries the LLM multiple times, and generates instances for both classes and properties from its replies. Thus, the ontology is automatically filled with domain-specific knowledge, compliant to the initial schema. As a result, the ontology is quickly and automatically enriched with manifold instances, which experts may consider to keep, adjust, discard, or complement according to their own needs and expertise. Contribution. We formalise our method in general way and instantiate it over various LLMs, as well as on a concrete case study. We report experiments rooted in the nutritional domain where an ontology of food meals and their ingredients is automatically instantiated from scratch, starting from a categorisation of meals and their relationships. There, we analyse the quality of the generated ontologies and compare ontologies attained by exploiting different LLMs. Experimentally, our approach achieves a quality metric that is up to five times higher than the state-of-the-art, while reducing erroneous entities and relations by up to ten times. Finally, we provide a SWOT analysis of the proposed method.