TARIC: Memory-Augmented Traversability-Aware Outdoor VLN under Interrupted Semantic Cues

作者: Tianle Zeng, Hanjing Ye, Jianwei Peng, Jingwen Yu, Hanxuan Chen, Hong Zhang

分类: cs.RO, cs.AI

发布日期: 2026-05-29

💡 一句话要点

提出TARIC，解决户外VLN中语义线索中断下的可通行性导航问题

🎯 匹配领域: 支柱一：机器人控制 (Robot Control) 支柱三：空间感知与语义 (Perception & Semantics) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 视觉语言导航 户外机器人 可通行性 语义线索中断 3D线索记忆

📋 核心要点

现有基于记忆的VLN方法在可通行性驱动的绕行下表现不佳，导致机器人中心线索过时和历史信息模糊。
TARIC通过提取语义方位并将其转化为可执行航向，利用3D线索记忆保持导航的连续可达和稳定。
实验结果表明，TARIC在模拟和真实世界中均显著提高了导航成功率，尤其是在长时间无线索间隔期间。

📝 摘要（中文）

本文提出了一种统一的户外视觉-语言导航（VLN）框架TARIC，旨在解决长距离、开放世界环境中频繁发生的语义线索中断问题。当目标线索变得稀疏、被遮挡或离开视野时，智能体容易迷失方向。TARIC通过维护与可通行性一致的可执行导航，即使在长时间的无线索阶段也能保持目标导向。该方法从可见性门控的目标或探索线索中提取语义方位，并使用实时近场可通行性剖面将其转化为可执行的航向，从而提供超越简单安全过滤的目标一致可行导航。为了防止导航在绕行期间退化，TARIC将间歇性的2D证据提升到世界对齐的3D线索记忆中，并采用不确定性感知的读取机制，确保导航在机器人移动时保持连续可达和稳定。在四足和轮式平台上进行的评估表明，该方法在模拟环境中的成功率比最强的基线提高了10个百分点以上，并在真实世界中实现了40%的成功率，而最强的基线为17.5%，并且在长时间的无线索间隔期间具有更高的鲁棒性。

🔬 方法详解

问题定义：户外视觉语言导航（VLN）任务中，由于环境的复杂性和开放性，智能体经常面临语义线索中断的问题，例如目标线索被遮挡、消失或超出视野范围。现有的基于记忆的方法在处理此类问题时，往往忽略了环境的可通行性，导致智能体在绕行时迷失方向，无法有效利用记忆信息。

核心思路：TARIC的核心思路是维护与环境可通行性一致的导航策略。它将语义线索转化为可执行的航向，并利用3D线索记忆来保持导航的连续性和稳定性。通过考虑可通行性，TARIC能够引导智能体在绕行时选择可行的路径，避免陷入无效的探索。

技术框架：TARIC框架主要包含以下几个模块：1) 语义方位提取模块，从可见的目标或探索线索中提取语义方位信息；2) 可通行性剖面模块，实时生成近场环境的可通行性剖面；3) 航向转换模块，将语义方位信息和可通行性剖面相结合，生成可执行的航向；4) 3D线索记忆模块，将间歇性的2D证据提升到世界对齐的3D空间中，并利用不确定性感知的读取机制来保持导航的连续性。

关键创新：TARIC的关键创新在于将可通行性信息融入到导航策略中，并利用3D线索记忆来解决语义线索中断的问题。与现有方法相比，TARIC不仅考虑了局部环境的安全性，还考虑了全局环境的可通行性，从而能够更好地引导智能体在复杂环境中导航。此外，TARIC的不确定性感知读取机制能够有效地利用记忆信息，即使在绕行期间也能保持导航的稳定性。

关键设计：TARIC的关键设计包括：1) 使用visibility-gated机制来选择有效的语义线索；2) 利用实时近场可通行性剖面来生成可执行的航向；3) 构建世界对齐的3D线索记忆，并使用不确定性感知的读取机制来保持导航的连续性。具体的参数设置和网络结构在论文中未详细说明，属于未知信息。

🖼️ 关键图片

📊 实验亮点

TARIC在模拟环境中的成功率比最强的基线提高了10个百分点以上，并在真实世界中实现了40%的成功率，而最强的基线为17.5%。实验结果表明，TARIC在长时间的无线索间隔期间具有更高的鲁棒性，能够有效地解决语义线索中断的问题。

🎯 应用场景

TARIC框架可应用于各种户外机器人导航场景，例如自动驾驶、无人机巡检、搜救机器人等。通过提高机器人在复杂环境中的导航能力，TARIC可以帮助机器人更好地完成各种任务，例如物资运输、环境监测、人员搜救等，具有重要的实际应用价值和广泛的未来发展前景。

📄 摘要（原文）

Outdoor vision-language navigation (VLN) in long-range, open-world environments is frequently disrupted by semantic-cue interruptions, where informative goal cues become sparse, occluded, or leave the field of view. Once such cues disappear, agents enter a cue-free phase and often degrade into backtracking, oscillatory headings, or aimless exploration. While memory-based methods attempt to bridge these gaps, they often fail under traversability-driven detours: the remembered cue direction may be infeasible, forcing detours that prolong cue-free phases and gradually render robot-centric cues stale and implicit histories blurred. This makes traversability a stability condition for maintaining goal-directed guidance, rather than merely a local safety concern. We propose a unified outdoor VLN framework that survives semantic-cue interruptions by maintaining traversability-consistent executable guidance throughout prolonged cue-free phases. Specifically, our method extracts semantic bearings from visibility-gated goal or exploration cues and grounds them into executable headings using a real-time near-field traversability profile, providing goal-consistent feasible guidance beyond reject-only safety filtering. To prevent guidance degradation during detours, we lift intermittent 2D evidence into a world-aligned 3D cue memory with an uncertainty-aware readout mechanism, ensuring guidance remains continuously reachable and stable as the robot moves. We evaluate the framework on quadrupedal and wheeled platforms over 600--1000 m routes. Our method improves simulation success rate by over 10 percentage points over the strongest baseline and achieves a real-world success rate of 40%, compared to 17.5% for the strongest baseline, with substantially higher robustness during prolonged cue-free intervals.

TARIC: Memory-Augmented Traversability-Aware Outdoor VLN under Interrupted Semantic Cues

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理