DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition
作者: Wei Ai, Yuntao Shou, Tao Meng, Nan Yin, Keqin Li
分类: cs.CL, cs.AI
发布日期: 2023-12-17 (更新: 2024-08-31)
备注: 14 pages, 7 figures
💡 一句话要点
提出DER-GCN以解决多模态对话情感识别中的事件关系忽视问题
🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 多模态对话 情感识别 图卷积网络 事件关系 自监督学习 特征融合 对比学习
📋 核心要点
- 现有多模态对话情感识别方法主要关注语义和对话关系,忽视了事件关系对情感的影响,导致情感识别效果不佳。
- 本文提出DER-GCN,通过构建加权多关系图,建模说话者之间的对话关系及潜在事件关系,提升情感识别能力。
- 在IEMOCAP和MELD基准数据集上的实验结果显示,DER-GCN在情感识别的平均准确率和F1值上均有显著提升。
📝 摘要(中文)
随着深度学习的不断发展,多模态对话情感识别(MDER)任务近年来受到广泛关注。MDER旨在识别不同对话场景中包含的文本、视频和音频等多种模态的情感信息。然而,现有研究主要集中在建模语境语义信息和说话者之间的对话关系上,忽视了事件关系对情感的影响。为了解决这一问题,本文提出了一种新颖的对话与事件关系感知图卷积神经网络(DER-GCN)方法。该方法建模说话者之间的对话关系,并捕捉潜在的事件关系信息。通过构建加权多关系图,DER-GCN能够同时捕捉对话中的说话者依赖关系和事件关系。此外,本文还引入了一种自监督掩蔽图自编码器(SMGAE),以提高特征和结构的融合表示能力。实验结果表明,DER-GCN显著提高了情感识别的平均准确率和F1值。
🔬 方法详解
问题定义:本文旨在解决多模态对话情感识别中对事件关系的忽视问题。现有方法主要集中在对话语义和说话者关系的建模,未能充分考虑事件关系对情感的影响,导致识别效果不理想。
核心思路:DER-GCN通过构建加权多关系图,建模说话者之间的对话关系和潜在事件关系,从而更全面地捕捉情感信息。引入自监督掩蔽图自编码器(SMGAE)以增强特征融合能力,设计多信息变换器(MIT)以捕捉不同关系之间的相关性。
技术框架:DER-GCN的整体架构包括三个主要模块:加权多关系图构建模块、SMGAE特征融合模块和MIT关系捕捉模块。首先,通过对话数据构建加权图,接着利用SMGAE进行特征学习,最后通过MIT实现多模态信息的有效融合。
关键创新:DER-GCN的核心创新在于同时建模对话关系和事件关系,利用加权多关系图捕捉复杂的情感信息。这一方法与传统的单一关系建模方法有本质区别,能够更全面地反映情感状态。
关键设计:在模型设计中,采用了对比学习的损失优化策略,以增强少数类特征的表示学习能力。此外,SMGAE的自监督机制和MIT的多信息融合设计也是重要的技术细节,确保了模型的高效性和准确性。
📊 实验亮点
在IEMOCAP和MELD基准数据集上的实验结果表明,DER-GCN模型在情感识别任务中显著提高了平均准确率和F1值,具体提升幅度达到XX%(具体数据未知),显示出其在多模态情感识别中的有效性。
🎯 应用场景
该研究在多模态情感识别领域具有广泛的应用潜力,尤其适用于人机交互、社交媒体分析和情感计算等场景。通过更准确地识别情感信息,能够提升用户体验和情感理解,推动智能助手和情感分析工具的发展。
📄 摘要(原文)
With the continuous development of deep learning (DL), the task of multimodal dialogue emotion recognition (MDER) has recently received extensive research attention, which is also an essential branch of DL. The MDER aims to identify the emotional information contained in different modalities, e.g., text, video, and audio, in different dialogue scenes. However, existing research has focused on modeling contextual semantic information and dialogue relations between speakers while ignoring the impact of event relations on emotion. To tackle the above issues, we propose a novel Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition (DER-GCN) method. It models dialogue relations between speakers and captures latent event relations information. Specifically, we construct a weighted multi-relationship graph to simultaneously capture the dependencies between speakers and event relations in a dialogue. Moreover, we also introduce a Self-Supervised Masked Graph Autoencoder (SMGAE) to improve the fusion representation ability of features and structures. Next, we design a new Multiple Information Transformer (MIT) to capture the correlation between different relations, which can provide a better fuse of the multivariate information between relations. Finally, we propose a loss optimization strategy based on contrastive learning to enhance the representation learning ability of minority class features. We conduct extensive experiments on the IEMOCAP and MELD benchmark datasets, which verify the effectiveness of the DER-GCN model. The results demonstrate that our model significantly improves both the average accuracy and the f1 value of emotion recognition.