Srivastava, Aseem
Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification
Mazhar, Abdullah, shaik, Zuhair hasan, Srivastava, Aseem, Ruhnke, Polly, Vaddavalli, Lavanya, Katragadda, Sri Keshav, Yadav, Shweta, Akhtar, Md Shad
The expression of mental health symptoms through non-traditional means, such as memes, has gained remarkable attention over the past few years, with users often highlighting their mental health struggles through figurative intricacies within memes. While humans rely on commonsense knowledge to interpret these complex expressions, current Multimodal Language Models (MLMs) struggle to capture these figurative aspects inherent in memes. To address this gap, we introduce a novel dataset, AxiOM, derived from the GAD anxiety questionnaire, which categorizes memes into six fine-grained anxiety symptoms. Next, we propose a commonsense and domain-enriched framework, M3H, to enhance MLMs' ability to interpret figurative language and commonsense knowledge. The overarching goal remains to first understand and then classify the mental health symptoms expressed in memes. We benchmark M3H against 6 competitive baselines (with 20 variations), demonstrating improvements in both quantitative and qualitative metrics, including a detailed human evaluation. We observe a clear improvement of 4.20% and 4.66% on weighted-F1 metric. To assess the generalizability, we perform extensive experiments on a public dataset, RESTORE, for depressive symptom identification, presenting an extensive ablation study that highlights the contribution of each module in both datasets. Our findings reveal limitations in existing models and the advantage of employing commonsense to enhance figurative understanding.
Trust Modeling in Counseling Conversations: A Benchmark Study
Srivastava, Aseem, Shaik, Zuhair Hasan, Chakraborty, Tanmoy, Akhtar, Md Shad
In mental health counseling, a variety of earlier studies have focused on dialogue modeling. However, most of these studies give limited to no emphasis on the quality of interaction between a patient and a therapist. The therapeutic bond between a patient and a therapist directly correlates with effective mental health counseling. It involves developing the patient's trust on the therapist over the course of counseling. To assess the therapeutic bond in counseling, we introduce trust as a therapist-assistive metric. Our definition of trust involves patients' willingness and openness to express themselves and, consequently, receive better care. We conceptualize it as a dynamic trajectory observable through textual interactions during the counseling. To facilitate trust modeling, we present MENTAL-TRUST, a novel counseling dataset comprising manual annotation of 212 counseling sessions with first-of-its-kind seven expert-verified ordinal trust levels. We project our problem statement as an ordinal classification task for trust quantification and propose a new benchmark, TrustBench, comprising a suite of classical and state-of-the-art language models on MENTAL-TRUST. We evaluate the performance across a suite of metrics and lay out an exhaustive set of findings. Our study aims to unfold how trust evolves in therapeutic interactions.
Sentiment-guided Commonsense-aware Response Generation for Mental Health Counseling
Srivastava, Aseem, Naik, Gauri, Cerezo, Alison, Chakraborty, Tanmoy, Akhtar, Md. Shad
The crisis of mental health issues is escalating. Effective counseling serves as a critical lifeline for individuals suffering from conditions like PTSD, stress, etc. Therapists forge a crucial therapeutic bond with clients, steering them towards positivity. Unfortunately, the massive shortage of professionals, high costs, and mental health stigma pose significant barriers to consulting therapists. As a substitute, Virtual Mental Health Assistants (VMHAs) have emerged in the digital healthcare space. However, most existing VMHAs lack the commonsense to understand the nuanced sentiments of clients to generate effective responses. To this end, we propose EmpRes, a novel sentiment-guided mechanism incorporating commonsense awareness for generating responses. By leveraging foundation models and harnessing commonsense knowledge, EmpRes aims to generate responses that effectively shape the client's sentiment towards positivity. We evaluate the performance of EmpRes on HOPE, a benchmark counseling dataset, and observe a remarkable performance improvement compared to the existing baselines across a suite of qualitative and quantitative metrics. Moreover, our extensive empirical analysis and human evaluation show that the generation ability of EmpRes is well-suited and, in some cases, surpasses the gold standard. Further, we deploy EmpRes as a chat interface for users seeking mental health support. We address the deployed system's effectiveness through an exhaustive user study with a significant positive response. Our findings show that 91% of users find the system effective, 80% express satisfaction, and over 85.45% convey a willingness to continue using the interface and recommend it to others, demonstrating the practical applicability of EmpRes in addressing the pressing challenges of mental health support, emphasizing user feedback, and ethical considerations in a real-world context.
Knowledge Planning in Large Language Models for Domain-Aligned Counseling Summarization
Srivastava, Aseem, Joshi, Smriti, Chakraborty, Tanmoy, Akhtar, Md Shad
In mental health counseling, condensing dialogues into concise and relevant summaries (aka counseling notes) holds pivotal significance. Large Language Models (LLMs) exhibit remarkable capabilities in various generative tasks; however, their adaptation to domain-specific intricacies remains challenging, especially within mental health contexts. Unlike standard LLMs, mental health experts first plan to apply domain knowledge in writing summaries. Our work enhances LLMs' ability by introducing a novel planning engine to orchestrate structuring knowledge alignment. To achieve high-order planning, we divide knowledge encapsulation into two major phases: (i) holding dialogue structure and (ii) incorporating domain-specific knowledge. We employ a planning engine on Llama-2, resulting in a novel framework, PIECE. Our proposed system employs knowledge filtering-cum-scaffolding to encapsulate domain knowledge. Additionally, PIECE leverages sheaf convolution learning to enhance its understanding of the dialogue's structural nuances. We compare PIECE with 14 baseline methods and observe a significant improvement across ROUGE and Bleurt scores. Further, expert evaluation and analyses validate the generation quality to be effective, sometimes even surpassing the gold standard. We further benchmark PIECE with other LLMs and report improvement, including Llama-2 (+2.72%), Mistral (+2.04%), and Zephyr (+1.59%), to justify the generalizability of the planning engine.
Response-act Guided Reinforced Dialogue Generation for Mental Health Counseling
Srivastava, Aseem, Pandey, Ishan, Akhtar, Md. Shad, Chakraborty, Tanmoy
Virtual Mental Health Assistants (VMHAs) have become a prevalent method for receiving mental health counseling in the digital healthcare space. An assistive counseling conversation commences with natural open-ended topics to familiarize the client with the environment and later converges into more fine-grained domain-specific topics. Unlike other conversational systems, which are categorized as open-domain or task-oriented systems, VMHAs possess a hybrid conversational flow. These counseling bots need to comprehend various aspects of the conversation, such as dialogue-acts, intents, etc., to engage the client in an effective conversation. Although the surge in digital health research highlights applications of many general-purpose response generation systems, they are barely suitable in the mental health domain -- the prime reason is the lack of understanding in mental health counseling. Moreover, in general, dialogue-act guided response generators are either limited to a template-based paradigm or lack appropriate semantics. To this end, we propose READER -- a REsponse-Act guided reinforced Dialogue genERation model for the mental health counseling conversations. READER is built on transformer to jointly predict a potential dialogue-act d(t+1) for the next utterance (aka response-act) and to generate an appropriate response u(t+1). Through the transformer-reinforcement-learning (TRL) with Proximal Policy Optimization (PPO), we guide the response generator to abide by d(t+1) and ensure the semantic richness of the responses via BERTScore in our reward computation. We evaluate READER on HOPE, a benchmark counseling conversation dataset and observe that it outperforms several baselines across several evaluation metrics -- METEOR, ROUGE, and BERTScore. We also furnish extensive qualitative and quantitative analyses on results, including error analysis, human evaluation, etc.