Context-Aware Deep Learning for Multi Modal Depression Detection

Lam, Genevieve, Dongyan, Huang, Lin, Weisi

arXiv.org Artificial Intelligence 

In this study, we focus on automated approaches to detect depression from clinical interviews using multi-modal machine learning (ML). Our approach differentiates from other successful ML methods such as context-aware analysis through feature engineering and end-to-end deep neural networks for depression detection utilizing the Distress Analysis Interview Corpus. We propose a novel method that incorporates: (1) pre-trained Transformer combined with data augmentation based on topic modelling for textual data; and (2) deep 1D convolutional neural network (CNN) for acoustic feature modeling. The simulation results demonstrate the effectiveness of the proposed method for training multi-modal deep learning models. Our deep 1D CNN and Transformer models achieved state-of-the-art performance for audio and text modalities respectively. Combining them in a multi-modal framework also outperforms state-of-the-art for the combined setting. Code available at https://github.com/genandlam/multi-modal-depression-detection