HST-HGN: Heterogeneous Spatial-Temporal Hypergraph Networks with Bidirectional State Space Models for Global Fatigue Assessment

📄 arXiv: 2604.08435v1 📥 PDF

作者: Changdao Chen

分类: cs.CV, cs.AI

发布日期: 2026-04-09

备注: 10 pages


💡 一句话要点

提出HST-HGN以解决驾驶员疲劳评估问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture)

关键词: 驾驶员疲劳评估 时空超图网络 双向状态空间模型 多模态融合 实时监测

📋 核心要点

  1. 现有方法在长视频中评估驾驶员疲劳面临建模细微面部表情的长程时间依赖性挑战。
  2. 本文提出HST-HGN,通过异构时空超图网络和双向状态空间模型,动态融合几何拓扑与纹理补丁。
  3. HST-HGN在多项疲劳评估基准中表现出色,达到了最先进的性能,适合实时应用。

📝 摘要(中文)

在受限计算预算下,从未裁剪视频中评估驾驶员疲劳仍然具有挑战性,尤其是在细微面部表情的长程时间依赖建模方面。现有方法有的依赖计算量大的架构,有的则采用传统的轻量级成对图网络,然而后者在建模高阶协同和全球时间上下文方面能力有限。因此,本文提出了一种新颖的异构时空超图网络HST-HGN,结合双向状态空间模型。该方法通过分层超图网络动态融合姿态解耦的几何拓扑与多模态纹理补丁,有效克服了传统方法的局限性。实验表明,HST-HGN在多种疲劳基准测试中表现出色,兼具判别能力和计算效率,适合实时车内边缘部署。

🔬 方法详解

问题定义:本文旨在解决从未裁剪视频中评估驾驶员疲劳的问题。现有方法在建模细微面部表情的长程时间依赖性时存在困难,导致评估准确性不足。

核心思路:HST-HGN的核心思路是结合异构时空超图网络与双向状态空间模型,通过动态融合几何拓扑与多模态纹理补丁,增强对高阶协同和全局时间上下文的建模能力。

技术框架:HST-HGN的整体架构包括分层超图网络和Bi-Mamba模块。分层超图网络用于融合不同模态的信息,而Bi-Mamba模块则实现了线性复杂度的双向序列建模。

关键创新:HST-HGN的主要创新在于引入了分层超图网络和Bi-Mamba模块,使得网络能够有效区分模糊的瞬态动作,如打哈欠与说话,同时捕捉其完整的生理生命周期。

关键设计:在网络设计中,采用了特定的损失函数以优化模型性能,并通过层次化的超图结构来增强对面部表情的建模能力。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

HST-HGN在多项疲劳评估基准测试中表现出色,达到了最先进的性能,尤其是在区分模糊瞬态动作方面,展现出显著的提升,兼具判别能力与计算效率,适合实时应用。

🎯 应用场景

该研究在驾驶员疲劳监测领域具有重要应用潜力,能够实时评估驾驶员状态,提升道路安全性。未来,该技术可扩展至其他需要实时情绪识别的场景,如智能交通系统和人机交互界面。

📄 摘要(原文)

It remains challenging to assess driver fatigue from untrimmed videos under constrained computational budgets, due to the difficulty of modeling long-range temporal dependencies in subtle facial expressions. Some existing approaches rely on computationally heavy architectures, whereas others employ traditional lightweight pairwise graph networks, despite their limited capacity to model high-order synergies and global temporal context. Therefore, we propose HST-HGN, a novel Heterogeneous Spatial-Temporal Hypergraph Network driven by Bidirectional State Space Models. Spatially, we introduce a hierarchical hypergraph network to fuse pose-disentangled geometric topologies with multi-modal texture patches dynamically. This formulation encapsulates high-order synergistic facial deformations, effectively overcoming the limitations of conventional methods. In temporal terms, a Bi-Mamba module with linear complexity is applied to perform bidirectional sequence modeling. This explicit temporal-evolution filtering enables the network to distinguish highly ambiguous transient actions, such as yawning versus speaking, while encompassing their complete physiological lifecycles. Extensive evaluations across diverse fatigue benchmarks demonstrate that HST-HGN achieves state-of-the-art performance. In particular, our method strikes a balance between discriminative power and computational efficiency, making it well-suited for real-time in-cabin edge deployment.