高级检索+

句子级时序卷积网络的多模态抑郁症识别方法

Sentence-Level Temporal Convolutional Networks for Multimodal Depression Recognition

  • 摘要: 针对多模态抑郁症模型在特征提取时,语句间关联性较弱,不同模态间的特征融合较为随意,在中文数据集上模型的泛化能力缺乏验证等问题,本文通过分析与抑郁症相关的音频、文本和视觉特征,提出了基于改进TCN模型的多模态抑郁症识别模型STCMN(Sentence-level Temporal Convolutional Memory Network),并将该模型应用于临床抑郁症辅助诊断当中。该模型首先使用残差块、 GRU和Self-Attention的融合模块来提取不同模态下的句子级特征,增强了上下文联系,然后使用TCN模型来提取不同模态的全局特征,并使用Cross Attention对不同模态的全局特征以多模态融合特征为主进行融合,最后通过LogSoftmax层得到模型对抑郁症的识别结果。在DAIC-WOZ公开数据集上,本文所提出的方法对抑郁症识别的准确率达到了91.3%,精确率达到了93.6%,召回率达到了89.7%,其相关指标均优于其他方法,可以更好地满足临床医学的需求。在私有中文数据集MMD2022上,STCMN模型的识别结果仍为最优,表明该模型在中文抑郁症识别任务上具较好的泛化能力。

     

    Abstract: In the feature extraction of multimodal depression models, there are problems such as weak correlation between sentences, random feature fusion between different modalities, and lack of verification of the generalization ability of the model on the Chinese data set. By analyzing audio, text and visual features related to depression, this paper proposed a multi-modal depression recognition model STCMN(Sentence-level Temporal Convolutional Memory Network) based on improved TCN model. And the model was applied to the auxiliary diagnosis of clinical depression. Firstly, the fusion module of residual block, GRU and Self-Attention was used to extract the sentence-level features under different modalities, which enhances the context connection. Then, the TCN model was used to extract the global features of different modalities. Cross Attention was used to fuse the global features of different modalities mainly with multi-modal fusion features. Finally, the recognition results of the model for depression were obtained through the LogSoftmax layer. On the DAIC-WOZ public dataset, the accuracy rate, precision rate and recall rate of the proposed method for depression recognition reach 91. 3%, 93. 6% and 89. 7%, respectively. The related indicators are better than other methods, which can better meet the needs of clinical medicine. On the private Chinese dataset MMD2022, the recognition results of STCMN model are still the best, indicating that the model has good generalization ability in Chinese depression recognition tasks.

     

/

返回文章
返回