BrainVista: Modeling Naturalistic Brain Dynamics as Multimodal Next-Token Prediction

作者: Xuanhua Yin, Runkai Zhao, Lina Yao, Weidong Cai

分类: q-bio.NC, cs.AI

发布日期: 2026-02-04

备注: 17 pages, 7 figures, 11 tables

💡 一句话要点

提出BrainVista以解决自然主义脑动态建模问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 自然主义fMRI 多模态建模 因果演变 网络级标记器 空间混合头 刺激到大脑掩蔽机制 脑动态建模 模式相关性提升

📋 核心要点

现有方法在模拟大脑复杂动态时，面临多模态输入与皮层网络拓扑之间的时间尺度不匹配问题。
BrainVista通过引入网络级标记器和空间混合头，解耦系统动态并捕捉网络间信息流，解决了因果演变建模的挑战。
实验结果显示，BrainVista在长时间展望设置中，相较于最强基线Algonauts 2025和CineBrain，模式相关性分别提高了36.0%和33.3%。

📝 摘要（中文）

自然主义功能性磁共振成像（fMRI）将大脑视为一个由连续感官流驱动的动态预测引擎。然而，现有方法在模拟复杂皮层网络的因果前向演变时面临多模态输入与网络拓扑之间的时间尺度不匹配问题。为了解决这些挑战，本文提出了BrainVista，一个多模态自回归框架，旨在建模大脑状态的因果演变。BrainVista引入了网络级标记器以解耦系统特定动态，并采用空间混合头捕捉网络间信息流。此外，提出了一种新颖的刺激到大脑（S2B）掩蔽机制，以同步高频感官刺激与血流动力学过滤信号，从而实现严格的历史因果条件。通过在Algonauts 2025、CineBrain和HAD上的验证，BrainVista在fMRI编码性能上达到了最先进的水平，并在长时间展望设置中显著提高了模式相关性。

🔬 方法详解

问题定义：本文旨在解决自然主义fMRI中大脑动态建模的因果演变问题，现有方法由于时间尺度不匹配而难以有效模拟复杂的皮层网络动态。

核心思路：BrainVista的核心思路是通过多模态自回归框架，结合网络级标记器和空间混合头，来解耦系统特定动态并捕捉网络间的信息流，从而实现对大脑状态的因果建模。

技术框架：整体架构包括输入多模态数据，通过网络级标记器进行动态解耦，然后利用空间混合头捕捉信息流，最后通过S2B掩蔽机制实现因果条件的严格控制。

关键创新：最重要的技术创新在于引入了S2B掩蔽机制，使得高频感官刺激与血流动力学信号之间的同步成为可能，确保了因果建模的历史依赖性。

关键设计：在参数设置上，BrainVista采用了特定的损失函数以优化多模态输入的融合效果，网络结构设计上则强调了模块间的功能边界，确保信息流的有效捕捉。

🖼️ 关键图片

📊 实验亮点

实验结果表明，BrainVista在长时间展望设置中，相较于最强基线Algonauts 2025和CineBrain，模式相关性分别提高了36.0%和33.3%，显示出显著的性能提升，验证了其在fMRI编码中的有效性。

🎯 应用场景

该研究的潜在应用领域包括神经科学、心理学以及脑机接口等。通过更准确地建模大脑动态，BrainVista可以帮助研究人员理解大脑如何处理复杂的感官信息，进而推动相关领域的研究和应用发展。

📄 摘要（原文）

Naturalistic fMRI characterizes the brain as a dynamic predictive engine driven by continuous sensory streams. However, modeling the causal forward evolution in realistic neural simulation is impeded by the timescale mismatch between multimodal inputs and the complex topology of cortical networks. To address these challenges, we introduce BrainVista, a multimodal autoregressive framework designed to model the causal evolution of brain states. BrainVista incorporates Network-wise Tokenizers to disentangle system-specific dynamics and a Spatial Mixer Head that captures inter-network information flow without compromising functional boundaries. Furthermore, we propose a novel Stimulus-to-Brain (S2B) masking mechanism to synchronize high-frequency sensory stimuli with hemodynamically filtered signals, enabling strict, history-only causal conditioning. We validate our framework on Algonauts 2025, CineBrain, and HAD, achieving state-of-the-art fMRI encoding performance. In long-horizon rollout settings, our model yields substantial improvements over baselines, increasing pattern correlation by 36.0\% and 33.3\% on relative to the strongest baseline Algonauts 2025 and CineBrain, respectively.

BrainVista: Modeling Naturalistic Brain Dynamics as Multimodal Next-Token Prediction

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理