R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

作者: João Pedro Gandarela, Thiago Rios, Stefan Menzel, André Freitas

分类: cs.AI, cs.CL, cs.MA

发布日期: 2026-06-03

💡 一句话要点

提出R-APS以解决长时间规划中的推理失败问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 长时间规划 推理失败 反思对抗帕累托搜索 鲁棒性 元学习

📋 核心要点

现有方法在长时间规划和推理中存在错误传播、扰动评估不足和知识失效等结构性失败。
论文提出R-APS，通过推理模式分解为每种推理模式分配独立上下文，协调三种时间尺度的交互。
实验结果表明，R-APS在鲁棒性和效率上显著优于现有基线，展现出较强的竞争力。

📝 摘要（中文）

大型语言模型（LLMs）在开放式任务中表现流畅，但在需要规划、使用工具和长时间行动的代理环境中，流畅性并不保证可靠交付。我们将这一差距归因于三个相互关联的结构性失败：错误传播无法定位、最坏情况扰动未评估以及积累知识从未失效。我们提出了反思对抗帕累托搜索（R-APS），这是首个通过推理模式分解共同解决这三种失败的方法。R-APS不需要微调，完全通过结构化协议设计在冻结的LLM上运行。我们在平面机制合成（机器人、假肢、机械设计）上进行了评估，结果显示R-APS在32个目标轨迹上提供的鲁棒性证书比均匀扰动基线紧凑3.5倍，首次接受的迭代速度提高46%，Chamfer距离减少2.1倍，同时控制杆数和最坏情况鲁棒性。

🔬 方法详解

问题定义：论文要解决的是在长时间规划中，现有大型语言模型在推理过程中出现的错误传播、扰动评估不足和知识失效等问题。这些问题导致了系统在复杂任务中的可靠性下降。

核心思路：论文的核心思路是通过反思对抗帕累托搜索（R-APS）方法，将不同的推理模式分解并为每种模式分配独立的上下文，从而解决上述结构性失败。通过这种方式，R-APS能够在不同时间尺度上协调推理过程，提升系统的整体鲁棒性和可靠性。

技术框架：R-APS的整体架构包括三个主要模块：分阶段的组合推理与类型验证批评者（用于错误定位）、敏感性引导的反事实压力测试（作为首要帕累托目标）和元归纳规则提取（带有显式失效）。这些模块协同工作，确保系统在执行任务时能够有效应对各种挑战。

关键创新：R-APS的最重要技术创新在于其推理模式的分解与上下文的独立分配。这一设计使得不同推理模式之间的相互影响得以控制，从而有效解决了现有方法中存在的多种推理失败。

关键设计：R-APS不需要对大型语言模型进行微调，而是通过结构化协议设计来实现其功能。关键设计包括对每个推理模式的上下文管理、损失函数的设置以及对鲁棒性和知识失效的明确处理。

📊 实验亮点

实验结果显示，R-APS在32个目标轨迹上提供的鲁棒性证书比均匀扰动基线紧凑3.5倍，首次接受的迭代速度提高46%，Chamfer距离减少2.1倍，展现出显著的性能提升。

🎯 应用场景

该研究的潜在应用领域包括机器人技术、假肢设计和机械设计等。通过提升系统在复杂任务中的推理能力，R-APS能够为这些领域带来更高的可靠性和效率，未来可能推动智能系统在实际应用中的广泛采用。

📄 摘要（原文）

Large language models (LLMs) are fluent on open-ended tasks, yet in agentic settings, where a system must plan, use tools, and act over extended horizons, fluency does not ensure reliable delivery. We trace this gap to three coupled structural failures: errors propagate without localization, worst-case perturbations go unevaluated, and accumulated knowledge is never invalidated. We argue these share a root cause: abductive, counterfactual, meta-inductive, corrective, and inductive reasoning pull a shared context in incompatible directions. We introduce Reflective Adversarial Pareto Search (R-APS), to our knowledge the first method addressing all three failures jointly via reasoning-mode decomposition, allocating each reasoning mode its own context and orchestrating interaction across three timescales: staged compositional reasoning with a typed validation critic (failure localization), sensitivity-guided counterfactual stress-testing as a first-class Pareto objective (robustness), and meta-inductive rule extraction with explicit invalidation (persistent memory). R-APS requires no fine-tuning and operates on a frozen LLM purely via structured protocol design. We evaluate on planar mechanism synthesis (robotics, prosthetics, mechanical design), with every candidate checked by a kinematic solver. On 32 target trajectories, R-APS delivers robustness certificates 3.5x tighter than uniform-perturbation baselines, 46% faster iterations-to-first-admission, and 2.1x Chamfer-distance reduction over Enum+GA while jointly controlling bar-count and worst-case robustness. Small 4B reasoning-specialized models prove competitive with general-purpose 70B backbones inside the protocol, suggesting structured protocols can partially offset model scale.

R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理