A Bayesian approach for prompt optimization in pre-trained language models
作者: Antonio Sabbatella, Andrea Ponti, Antonio Candelieri, Ilaria Giordani, Francesco Archetti
分类: cs.LG, cs.AI
发布日期: 2023-12-01
💡 一句话要点
提出贝叶斯优化方法以解决预训练语言模型的提示优化问题
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 贝叶斯优化 提示优化 预训练语言模型 组合优化 自然语言处理 RoBERTa 黑箱优化
📋 核心要点
- 现有方法在高维标记空间中进行提示选择时面临组合优化问题,计算复杂度高,难以处理。
- 本文提出了一种基于贝叶斯优化的提示优化方法,能够在不直接访问LLM的情况下高效搜索离散标记。
- 实验结果显示,该方法在多个基准测试中表现良好,验证了其在实际应用中的有效性和效率。
📝 摘要(中文)
提示是从词汇表中根据某些规则选择的符号或标记序列,附加到文本查询前。本文将提示选择问题表述为组合优化问题,提出了一种在组合空间的连续嵌入中执行的贝叶斯优化方法。研究重点是硬提示调优(HPT),该方法直接搜索要添加到文本输入的离散标记,无需访问大型语言模型(LLM),适用于仅以黑箱形式提供的LLM。本文使用BoTorch库进行贝叶斯优化研究,实验结果表明在六个基准测试上,RoBERTa模型在多种任务中表现良好,分析了搜索空间大小、准确性和时间消耗之间的权衡。
🔬 方法详解
问题定义:本文旨在解决如何有效选择提示序列的问题,现有方法在处理高维组合空间时计算复杂度高,难以实现有效优化。
核心思路:提出了一种贝叶斯优化方法,通过在组合空间的连续嵌入中执行优化,能够高效地搜索离散标记,避免了对LLM的直接访问。
技术框架:整体架构包括数据预处理、贝叶斯优化模块和结果评估。首先对输入数据进行处理,然后利用BoTorch库进行贝叶斯优化,最后评估优化结果的性能。
关键创新:最重要的创新在于将贝叶斯优化应用于提示选择问题,特别是在黑箱环境下的有效性,显著提高了优化效率。
关键设计:在参数设置上,使用了标准的贝叶斯优化配置,损失函数设计为适应分类任务的准确性评估,网络结构采用了RoBERTa作为基础模型进行实验。
📊 实验亮点
实验结果表明,使用贝叶斯优化的提示选择方法在六个基准测试上表现优异,相较于传统方法,准确性和效率都有显著提升,展示了良好的样本效率和模块化结构。
🎯 应用场景
该研究的潜在应用领域包括自然语言处理中的文本分类、问答系统和对话生成等。通过优化提示,可以显著提高模型的性能和响应速度,具有重要的实际价值和广泛的应用前景,尤其是在模型即服务(MaaS)环境下。
📄 摘要(原文)
A prompt is a sequence of symbol or tokens, selected from a vocabulary according to some rule, which is prepended/concatenated to a textual query. A key problem is how to select the sequence of tokens: in this paper we formulate it as a combinatorial optimization problem. The high dimensionality of the token space com-pounded by the length of the prompt sequence requires a very efficient solution. In this paper we propose a Bayesian optimization method, executed in a continuous em-bedding of the combinatorial space. In this paper we focus on hard prompt tuning (HPT) which directly searches for discrete tokens to be added to the text input with-out requiring access to the large language model (LLM) and can be used also when LLM is available only as a black-box. This is critically important if LLMs are made available in the Model as a Service (MaaS) manner as in GPT-4. The current manu-script is focused on the optimization of discrete prompts for classification tasks. The discrete prompts give rise to difficult combinatorial optimization problem which easily become intractable given the dimension of the token space in realistic applications. The optimization method considered in this paper is Bayesian optimization (BO) which has become the dominant approach in black-box optimization for its sample efficiency along with its modular structure and versatility. In this paper we use BoTorch, a library for Bayesian optimization research built on top of pyTorch. Albeit preliminary and obtained using a 'vanilla' version of BO, the experiments on RoB-ERTa on six benchmarks, show a good performance across a variety of tasks and enable an analysis of the tradeoff between size of the search space, accuracy and wall clock time.