Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift

📄 arXiv: 2605.26589v1 📥 PDF

作者: Yusuf Brima, Marcellin Atemkeng, Lansana Hassim Kallon, David Niyukuri, Antoine Vacavant, Samuel Saidu, Ding-Geng Chen

分类: cs.LG, cs.AI, stat.ML

发布日期: 2026-05-26


💡 一句话要点

提出基于TabPFN的模型以解决儿童贫血预测中的数据稀缺问题

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 儿童贫血 数据稀缺 模型泛化 TabPFN 公共卫生 机器学习 特征重要性 变换器模型

📋 核心要点

  1. 儿童贫血预测面临数据稀缺和模型泛化能力不足的挑战,现有方法在不同国家间的适用性较差。
  2. 本文提出了一种基于TabPFN的变换器模型,旨在提高儿童贫血预测的准确性和稳定性,尤其是在数据稀缺的情况下。
  3. 实验结果显示,TabPFN在低样本量(<200)下表现优异,Brier分数为0.042,ECE为0.203,且在不同国家间的表现稳定。

📝 摘要(中文)

儿童贫血在全球6-59个月的儿童中影响约40%,其成因复杂,限制了模型的泛化能力。本文评估了一种基于变换器的表格基础模型TabPFN与传统监督学习方法在跨国和数据稀缺环境下的表现。使用来自16个国家的DHS数据(n=68,856),比较了逻辑回归、XGBoost、LightGBM和TabPFN v2.6。结果表明,在低数据环境下,TabPFN在区分度和校准性上优于传统模型,且在不同国家间的表现稳定,显示出基础模型在全球健康预测中的潜力。

🔬 方法详解

问题定义:本文旨在解决儿童贫血预测中的数据稀缺和模型泛化能力不足的问题。现有方法在不同国家和人口特征下的适用性较差,影响了预测的准确性。

核心思路:论文提出使用TabPFN模型,该模型基于变换器架构,能够在数据稀缺的情况下提供更好的区分度和校准性,适应不同国家的特征。

技术框架:研究使用了来自16个国家的DHS数据,采用了多种模型进行比较,包括逻辑回归、XGBoost、LightGBM和TabPFN v2.6。评估指标包括AUC-ROC、Brier分数和ECE,采用了LOCO和反向LOCO方法进行泛化能力评估。

关键创新:TabPFN在低数据环境下表现优于传统模型,特别是在样本量少于200时,显示出更高的区分度和更好的校准性,这是其核心创新点。

关键设计:使用SHAP方法评估特征重要性,发现儿童年龄、海拔和身高与年龄的z-score是主要预测因子,此外,财富和母亲教育水平也对预测结果有显著影响。实验中采用了多种评估方法,确保结果的可靠性和稳定性。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,TabPFN在低样本量(<200)下的Brier分数为0.042,ECE为0.203,明显优于传统模型。同时,在全数据设置下,AUC-ROC范围为0.59-0.76,模型间差异小于0.05,表明其在不同国家间的稳定性。

🎯 应用场景

该研究的潜在应用领域包括公共卫生、儿童健康监测和全球健康预测。通过提高儿童贫血的预测准确性,能够为政策制定者和医疗工作者提供更有效的干预措施,从而改善儿童的健康状况。未来,该模型可能在其他健康问题的预测中展现出类似的优势。

📄 摘要(原文)

Childhood anemia affects around 40% of children aged 6-59 months globally and arises from heterogeneous factors, limiting model generalizability. We evaluate a transformer-based tabular foundation model against classical supervised methods under cross-country and data-scarce settings. We used DHS data from 16 countries across Africa, Asia, Latin America, the Caucasus, and the Middle East (n=68,856). We compared Logistic Regression, XGBoost, LightGBM, and TabPFN v2.6. Performance was assessed using AUC-ROC, Brier score, and ECE. Generalization was evaluated using leave-one-country-out (LOCO), reverse-LOCO, and few-shot settings. Subgroup analyses included sex, age, residence, maternal education, and wealth. Feature importance was estimated using SHAP. TabPFN outperformed classical models in low-data regimes (<200 samples), showing higher discrimination and better calibration. Across countries, it achieved the lowest Brier score (0.042) and ECE (0.203). Under full-data settings, AUC-ROC ranged from 0.59-0.76 with small between-model differences ($\leq 0.05$). LOCO performance was stable (0.58-0.69), driven by country context. Reverse-LOCO showed asymmetric transferability. Subgroup performance was consistent with no systematic demographic bias. SHAP identified child age, altitude, and height-for-age z-score as dominant predictors, followed by wealth and maternal education. Performance in childhood anemia prediction is driven more by population variation than model choice. TabPFN provides advantages in low-resource settings through improved discrimination and calibration, highlighting foundation models as promising tools for data-scarce global health prediction.