On-sky demonstration of reinforcement learning for adaptive optics control

📄 arXiv: 2606.10771v1 📥 PDF

作者: Jalo Nousiainen, Vincent Chambouleyron, Benoit Neichel, Sylvain Cetre, Jean-Francois Sauvage, Angelie Alagao, Markus Kasper, Jonathan Dray, Romain Fetick, Byron Engler

分类: astro-ph.IM, cs.LG, cs.RO

发布日期: 2026-06-09

备注: 11 pages, 12 figures accepted by A&A


💡 一句话要点

提出PO4AO以解决自适应光学控制中的实时优化问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture)

关键词: 自适应光学 强化学习 控制系统 天文观测 实时优化 鲁棒性 性能提升

📋 核心要点

  1. 现有自适应光学控制方法在实际应用中面临噪声、振动等多种不确定性,导致性能不稳定。
  2. 论文提出了PO4AO控制器,利用强化学习算法实现自适应光学系统的实时优化,增强了系统的鲁棒性。
  3. 实验结果表明,PO4AO在不同观测条件下均优于传统控制器,成功补偿了振动并降低了测量噪声的影响。

📝 摘要(中文)

基于强化学习的算法近年来在自适应光学控制中展现出良好的前景。尽管在模拟和实验室环境中表现出对实际环境影响的鲁棒性,但尚未在实际天空中验证其性能。本文首次展示了名为PO4AO的强化学习控制器在天空中的应用,分析其行为并提出改进方向。PO4AO在OHP的Papyrus自适应光学系统上实施,结果显示其在多种观测条件下均优于传统积分控制器,成功学习并补偿了振动模式,展现出对测量噪声的强鲁棒性。即使在未优化的Python实现下,PO4AO仍表现出高效的控制能力,为强化学习在自适应光学中的更广泛应用铺平了道路。

🔬 方法详解

问题定义:论文要解决自适应光学控制中的实时优化问题,现有方法在面对实际环境中的噪声、振动和快速变化的观测条件时,性能往往不稳定,难以满足高精度的需求。

核心思路:论文提出的PO4AO控制器基于强化学习,通过在线学习和适应,实时优化自适应光学系统的控制策略,从而提高系统的鲁棒性和适应性。

技术框架:PO4AO的整体架构包括数据采集模块、强化学习算法模块和控制输出模块。数据采集模块负责实时获取观测数据,强化学习模块则根据数据进行策略优化,控制输出模块将优化后的策略应用于自适应光学系统。

关键创新:PO4AO的主要创新在于其能够在实际天空条件下进行在线学习和适应,显著提高了自适应光学系统在复杂环境中的控制性能,与传统的积分控制器相比,展现出更强的鲁棒性和适应性。

关键设计:PO4AO的实现中,采用了共享内存缓冲区与现有实时控制器接口,使用单一超参数集进行多种观测条件下的控制,尽管存在约750μs的额外延迟,仍能保持高效的控制性能。

📊 实验亮点

实验结果显示,PO4AO在多种观测条件下均优于传统积分控制器,成功学习并补偿了振动模式,展现出对测量噪声的强鲁棒性。即使在未优化的实现下,PO4AO仍能以高效的方式进行控制,展现出显著的性能提升。

🎯 应用场景

该研究的潜在应用领域包括天文观测、光学成像和激光通信等。通过提升自适应光学系统的控制性能,PO4AO能够在复杂的观测条件下提供更清晰的图像,具有重要的实际价值和广泛的应用前景,未来可能推动强化学习在更多光学系统中的应用。

📄 摘要(原文)

Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coudé focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,μ\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.