Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT

📄 arXiv: 2404.15310v1 📥 PDF

作者: Ruikun Hou, Tim Fütterer, Babette Bühler, Efe Bozkir, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci

分类: cs.HC, cs.AI, cs.CY, cs.LG

发布日期: 2024-04-01

备注: Accepted as a full paper by the 25th International Conference on Artificial Intelligence in Education (AIED 2024)

期刊: Proceedings of the 25th International Conference on Artificial Intelligence in Education (AIED 2024)

DOI: 10.1007/978-3-031-64302-6_5


💡 一句话要点

提出多模态情感特征与ChatGPT结合的课堂鼓励与温暖自动评估方法

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 多模态情感分析 课堂观察 自动评估 教师培训 情感识别 ChatGPT 教育技术

📋 核心要点

  1. 现有的课堂观察方法依赖人工评分,耗时且不够可靠,难以满足教师对反馈的需求。
  2. 本文提出了一种结合多模态情感特征和ChatGPT的自动评估方法,旨在提高评估的效率和准确性。
  3. 实验结果表明,结合多种模型的集成方法在评估准确性上达到了与人类评分者相近的水平,具有显著提升。

📝 摘要(中文)

课堂观察协议标准化了教学效果评估,并促进了课堂互动的理解。然而,人工评估资源密集且常常不可靠,因此开发基于AI的自动化编码方法成为研究热点。本文探讨了一种多模态方法,通过面部和语音情感识别及情感分析,从视频、音频和文本数据中提取可解释特征,以自动估计课堂中的鼓励和温暖。我们在GTI数据集上进行了实验,结果显示,GPT-4与最佳训练模型的相关性分别为r = .341和r = .441,结合两者的平均值后,相关性提升至r = .513,接近人类评分者的一致性。研究结果为自动化课堂观察提供了新思路,旨在通过频繁和有价值的反馈促进教师培训。

🔬 方法详解

问题定义:本文旨在解决课堂教学效果评估中人工评分的低效率和不可靠性问题,现有方法往往难以提供及时和准确的反馈。

核心思路:通过多模态情感特征提取与ChatGPT的结合,自动化评估课堂中的鼓励和温暖,旨在提高评估的准确性和效率。

技术框架:整体流程包括数据采集(视频、音频和文本)、情感特征提取(面部和语音识别)、模型训练(分类与回归)以及结果评估(与人类评分对比)。

关键创新:本研究的创新点在于将多模态情感分析与大型语言模型结合,利用GPT-4进行文本评分,显著提升了评估的准确性和可解释性。

关键设计:在模型训练中,采用了适应性损失函数和集成学习策略,确保了模型在不同数据模态下的鲁棒性和准确性。

📊 实验亮点

实验结果显示,GPT-4与最佳训练模型的相关性分别为r = .341和r = .441,结合两者的集成方法后,相关性提升至r = .513,接近人类评分者的一致性,显示出该方法在自动化评估中的有效性。

🎯 应用场景

该研究的潜在应用领域包括教育评估、教师培训和课堂互动分析。通过自动化评估,教师能够获得及时的反馈,从而改善教学实践,提升教学质量。未来,该方法也可扩展至其他教育环境和评估场景,具有广泛的实际价值。

📄 摘要(原文)

Classroom observation protocols standardize the assessment of teaching effectiveness and facilitate comprehension of classroom interactions. Whereas these protocols offer teachers specific feedback on their teaching practices, the manual coding by human raters is resource-intensive and often unreliable. This has sparked interest in developing AI-driven, cost-effective methods for automating such holistic coding. Our work explores a multimodal approach to automatically estimating encouragement and warmth in classrooms, a key component of the Global Teaching Insights (GTI) study's observation protocol. To this end, we employed facial and speech emotion recognition with sentiment analysis to extract interpretable features from video, audio, and transcript data. The prediction task involved both classification and regression methods. Additionally, in light of recent large language models' remarkable text annotation capabilities, we evaluated ChatGPT's zero-shot performance on this scoring task based on transcripts. We demonstrated our approach on the GTI dataset, comprising 367 16-minute video segments from 92 authentic lesson recordings. The inferences of GPT-4 and the best-trained model yielded correlations of r = .341 and r = .441 with human ratings, respectively. Combining estimates from both models through averaging, an ensemble approach achieved a correlation of r = .513, comparable to human inter-rater reliability. Our model explanation analysis indicated that text sentiment features were the primary contributors to the trained model's decisions. Moreover, GPT-4 could deliver logical and concrete reasoning as potential teacher guidelines. Our findings provide insights into using advanced, multimodal techniques for automated classroom observation, aiming to foster teacher training through frequent and valuable feedback.