Deep Reinforcement Learning for Traveling Purchaser Problems

📄 arXiv: 2404.02476v6 📥 PDF

作者: Haofeng Yuan, Rongping Zhu, Wanlu Yang, Shiji Song, Keyou You, Wei Fan, C. L. Philip Chen

分类: math.OC, cs.AI, cs.LG

发布日期: 2024-04-03 (更新: 2025-07-02)


💡 一句话要点

提出基于深度强化学习的方法以解决旅行采购问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture)

关键词: 旅行采购问题 深度强化学习 组合优化 策略网络 元学习 线性规划 物流优化

📋 核心要点

  1. 现有的旅行采购问题解决方法在计算成本和性能上存在明显不足,难以有效处理复杂实例。
  2. 本文提出的基于深度强化学习的方法,分别处理路线构建和采购规划,优化全局解决方案。
  3. 实验结果显示,该方法在多种TPP实例上显著优于传统启发式算法,尤其在大规模实例上表现出色。

📝 摘要(中文)

旅行采购问题(TPP)是一个重要的组合优化问题,具有广泛的应用。现有方法通常同时处理路线构建和采购规划,导致计算成本高或性能有限。本文提出了一种基于深度强化学习(DRL)的新方法,分别处理路线构建和采购规划,并从全局角度评估和优化解决方案。该方法的关键组件包括用于捕捉市场与产品关系的二分图表示,以及提取信息并顺序构建路线的策略网络。通过引入元学习策略,策略网络能够在大型TPP实例上稳定训练,并在不同规模和分布的实例间良好泛化。实验结果表明,该方法显著优于传统TPP启发式算法,最优性差距降低40%-90%。

🔬 方法详解

问题定义:旅行采购问题(TPP)涉及在多个市场中采购产品并优化运输路线。现有方法通常将路线构建与采购规划结合,导致计算复杂度高和性能不足。

核心思路:本文提出的解决方案通过深度强化学习(DRL)将路线构建与采购规划分开处理,利用全局视角优化解决方案。

技术框架:整体架构包括二分图表示,用于捕捉市场与产品的关系,以及策略网络,负责从二分图中提取信息并顺序构建路线。

关键创新:该方法的创新在于通过策略网络高效构建路线,并利用线性规划轻松推导采购计划,显著降低了计算复杂度。

关键设计:在训练过程中,采用元学习策略以确保策略网络在不同规模的TPP实例上稳定训练,并能够很好地泛化到未见过的更大实例。具体的损失函数和网络结构设计未详细说明,需进一步研究。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果表明,基于深度强化学习的方法在多个合成TPP实例和TPPLIB基准上显著优于传统TPP启发式算法,最优性差距降低了40%-90%,并在运行时间上也表现出优势,尤其是在大规模实例中。

🎯 应用场景

该研究的潜在应用领域包括物流优化、供应链管理和智能交通系统等。通过优化旅行采购问题,可以显著降低运输成本,提高资源利用效率,具有重要的实际价值和经济效益。未来,该方法可能推动更多复杂优化问题的解决,促进相关领域的技术进步。

📄 摘要(原文)

The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant advantage of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, by leveraging DRL, we can train the policy network towards optimizing the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.