TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding

作者: Junwen Zhang, Pu Chen, Yin Zhang

分类: cs.AI

发布日期: 2025-06-26

备注: 43 pages and 11 figures

💡 一句话要点

提出TableMoE以解决多模态表格理解中的复杂性问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 多模态理解 神经符号推理 表格数据 结构化推理 专家模型 数据集构建 性能评估

📋 核心要点

现有多模态大型语言模型在处理复杂表格时表现不佳，尤其是在结构复杂和视觉退化的情况下。
本文提出TableMoE，通过神经符号路由机制动态路由表格元素到专门的专家，以实现更稳健的推理。
实验结果显示，TableMoE在多个WildStruct基准上显著超越现有模型，验证了其有效性和鲁棒性。

📝 摘要（中文）

在现实场景中，多模态表格理解面临结构复杂性、符号密度和视觉退化等挑战。现有的多模态大型语言模型在这些WildStruct条件下表现不佳。为此，本文提出了TableMoE，一种专为多模态表格数据设计的神经符号混合连接专家架构。TableMoE引入了创新的神经符号路由机制，通过预测潜在语义令牌角色并动态路由表格元素到专门的专家，从而实现稳健的结构化推理。我们还构建了大规模的TableMoE-Align数据集，并发布了四个WildStruct基准进行评估，实验结果表明TableMoE显著超越现有最先进模型。

🔬 方法详解

问题定义：本文旨在解决多模态表格理解中的结构复杂性和视觉退化问题。现有方法在这些WildStruct条件下表现不佳，导致性能和泛化能力有限。

核心思路：TableMoE的核心思路是通过神经符号路由机制，预测表格元素的潜在语义角色，并将其动态路由到专门的专家，以实现更高效的结构化推理。

技术框架：TableMoE采用混合连接专家架构，主要模块包括神经符号路由机制、专家模型（如Table-to-HTML、Table-to-JSON、Table-to-Code）以及基于信心的门控策略。

关键创新：最重要的创新是神经符号路由机制，它通过符号推理图来指导路由决策，显著提升了模型在复杂表格理解中的表现。

关键设计：在设计中，TableMoE使用了信心感知的门控策略，确保表格元素能够被有效地分配给合适的专家，同时引入了大规模的TableMoE-Align数据集进行预训练。

📊 实验亮点

实验结果表明，TableMoE在四个WildStruct基准上均显著超越现有最先进模型，具体性能提升幅度达到XX%（具体数据未知），验证了其在复杂表格理解中的有效性和鲁棒性。

🎯 应用场景

TableMoE的潜在应用场景包括金融、科学、生物医学和工业等领域，能够有效处理复杂的表格数据，提升数据分析和决策支持的能力。未来，该模型有望在多模态数据理解和智能问答系统中发挥重要作用。

📄 摘要（原文）

Multimodal understanding of tables in real-world contexts is challenging due to the complexity of structure, symbolic density, and visual degradation (blur, skew, watermarking, incomplete structures or fonts, multi-span or hierarchically nested layouts). Existing multimodal large language models (MLLMs) struggle with such WildStruct conditions, resulting in limited performance and poor generalization. To address these challenges, we propose TableMoE, a neuro-symbolic Mixture-of-Connector-Experts (MoCE) architecture specifically designed for robust, structured reasoning over multimodal table data. TableMoE features an innovative Neuro-Symbolic Routing mechanism, which predicts latent semantic token roles (e.g., header, data cell, axis, formula) and dynamically routes table elements to specialized experts (Table-to-HTML, Table-to-JSON, Table-to-Code) using a confidence-aware gating strategy informed by symbolic reasoning graphs. To facilitate effective alignment-driven pretraining, we introduce the large-scale TableMoE-Align dataset, consisting of 1.2M table-HTML-JSON-code quadruples across finance, science, biomedicine and industry, utilized exclusively for model pretraining. For evaluation, we curate and release four challenging WildStruct benchmarks: WMMFinQA, WMMTatQA, WMMTabDialog, and WMMFinanceMath, designed specifically to stress-test models under real-world multimodal degradation and structural complexity. Experimental results demonstrate that TableMoE significantly surpasses existing state-of-the-art models. Extensive ablation studies validate each core component, emphasizing the critical role of Neuro-Symbolic Routing and structured expert alignment. Through qualitative analyses, we further showcase TableMoE's interpretability and enhanced robustness, underscoring the effectiveness of integrating neuro-symbolic reasoning for multimodal table understanding.

TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册