Personal Assistant Systems
An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives
Cho, Young Min, Rai, Sunny, Ungar, Lyle, Sedoc, Joรฃo, Guntuku, Sharath Chandra
Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this gap, we conduct a comprehensive literature review using the PRISMA framework, reviewing 534 papers published in both computer science and medicine. Our systematic review reveals 136 key papers on building mental health-related conversational agents with diverse characteristics of modeling and experimental design techniques. We find that computer science papers focus on LLM techniques and evaluating response quality using automated metrics with little attention to the application while medical papers use rule-based conversational agents and outcome metrics to measure the health outcomes of participants. Based on our findings on transparency, ethics, and cultural heterogeneity in this review, we provide a few recommendations to help bridge the disciplinary divide and enable the cross-disciplinary development of mental health conversational agents.
A Multilingual Virtual Guide for Self-Attachment Technique
Law, Alicia Jiayun, Hu, Ruoyu, Alazraki, Lisa, Gopalan, Anandha, Polydorou, Neophytos, Edalat, Abbas
In this work, we propose a computational framework that leverages existing out-of-language data to create a conversational agent for the delivery of Self-Attachment Technique (SAT) in Mandarin. Our framework does not require large-scale human translations, yet it achieves a comparable performance whilst also maintaining safety and reliability. We propose two different methods of augmenting available response data through empathetic rewriting. We evaluate our chatbot against a previous, English-only SAT chatbot through non-clinical human trials (N=42), each lasting five days, and quantitatively show that we are able to attain a comparable level of performance to the English SAT chatbot. We provide qualitative analysis on the limitations of our study and suggestions with the aim of guiding future improvements.
Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation
Wang, Xi, Rahmani, Hossein A., Liu, Jiqun, Yilmaz, Emine
Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and address potential biases arising from the feedback loop inherent in multi-turn interactions, including selection bias and multiple popularity bias variants. Drawing inspiration from the success of generative data via using language models and data augmentation techniques, we present two novel strategies, 'Once-Aug' and 'PopNudge', to enhance model performance while mitigating biases. Through extensive experiments on ReDial and TG-ReDial benchmark datasets, we show a consistent improvement of CRS techniques with our data augmentation approaches and offer additional insights on addressing multiple newly formulated biases.
Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation
Li, Chengpeng, Yang, Zhengyi, Zhang, Jizhi, Wu, Jiancan, Wang, Dingxian, He, Xiangnan, Wang, Xiang
Reinforcement learning (RL) has been widely applied in recommendation systems due to its potential in optimizing the long-term engagement of users. From the perspective of RL, recommendation can be formulated as a Markov decision process (MDP), where recommendation system (agent) can interact with users (environment) and acquire feedback (reward signals).However, it is impractical to conduct online interactions with the concern on user experience and implementation complexity, and we can only train RL recommenders with offline datasets containing limited reward signals and state transitions. Therefore, the data sparsity issue of reward signals and state transitions is very severe, while it has long been overlooked by existing RL recommenders.Worse still, RL methods learn through the trial-and-error mode, but negative feedback cannot be obtained in implicit feedback recommendation tasks, which aggravates the overestimation problem of offline RL recommender. To address these challenges, we propose a novel RL recommender named model-enhanced contrastive reinforcement learning (MCRL). On the one hand, we learn a value function to estimate the long-term engagement of users, together with a conservative value learning mechanism to alleviate the overestimation problem.On the other hand, we construct some positive and negative state-action pairs to model the reward function and state transition function with contrastive learning to exploit the internal structure information of MDP. Experiments demonstrate that the proposed method significantly outperforms existing offline RL and self-supervised RL methods with different representative backbone networks on two real-world datasets.
Sheaf Neural Networks for Graph-based Recommender Systems
Purificato, Antonio, Cassarร , Giulia, Liรฒ, Pietro, Silvestri, Fabrizio
Recent progress in Graph Neural Networks has resulted in wide adoption by many applications, including recommendation systems. The reason for Graph Neural Networks' superiority over other approaches is that many problems in recommendation systems can be naturally modeled as graphs, where nodes can be either users or items and edges represent preference relationships. In current Graph Neural Network approaches, nodes are represented with a static vector learned at training time. This static vector might only be suitable to capture some of the nuances of users or items they define. To overcome this limitation, we propose using a recently proposed model inspired by category theory: Sheaf Neural Networks. Sheaf Neural Networks, and its connected Laplacian, can address the previous problem by associating every node (and edge) with a vector space instead than a single vector. The vector space representation is richer and allows picking the proper representation at inference time. This approach can be generalized for different related tasks on graphs and achieves state-of-the-art performance in terms of F1-Score@N in collaborative filtering and Hits@20 in link prediction. For collaborative filtering, the approach is evaluated on the MovieLens 100K with a 5.1% improvement, on MovieLens 1M with a 5.4% improvement and on Book-Crossing with a 2.8% improvement, while for link prediction on the ogbl-ddi dataset with a 1.6% refinement with respect to the respective baselines.
A Contextualized Real-Time Multimodal Emotion Recognition for Conversational Agents using Graph Convolutional Networks in Reinforcement Learning
Rahman, Fathima Abdul, Lu, Guang
Owing to the recent developments in Generative Artificial Intelligence (GenAI) and Large Language Models (LLM), conversational agents are becoming increasingly popular and accepted. They provide a human touch by interacting in ways familiar to us and by providing support as virtual companions. Therefore, it is important to understand the user's emotions in order to respond considerately. Compared to the standard problem of emotion recognition, conversational agents face an additional constraint in that recognition must be real-time. Studies on model architectures using audio, visual, and textual modalities have mainly focused on emotion classification using full video sequences that do not provide online features. In this work, we present a novel paradigm for contextualized Emotion Recognition using Graph Convolutional Network with Reinforcement Learning (conER-GRL). Conversations are partitioned into smaller groups of utterances for effective extraction of contextual information. The system uses Gated Recurrent Units (GRU) to extract multimodal features from these groups of utterances. More importantly, Graph Convolutional Networks (GCN) and Reinforcement Learning (RL) agents are cascade trained to capture the complex dependencies of emotion features in interactive scenarios. Comparing the results of the conER-GRL model with other state-of-the-art models on the benchmark dataset IEMOCAP demonstrates the advantageous capabilities of the conER-GRL architecture in recognizing emotions in real-time from multimodal conversational signals.
Defense Against Model Extraction Attacks on Recommender Systems
Zhang, Sixiao, Yin, Hongzhi, Chen, Hongxu, Long, Cheng
The robustness of recommender systems has become a prominent topic within the research community. Numerous adversarial attacks have been proposed, but most of them rely on extensive prior knowledge, such as all the white-box attacks or most of the black-box attacks which assume that certain external knowledge is available. Among these attacks, the model extraction attack stands out as a promising and practical method, involving training a surrogate model by repeatedly querying the target model. However, there is a significant gap in the existing literature when it comes to defending against model extraction attacks on recommender systems. In this paper, we introduce Gradient-based Ranking Optimization (GRO), which is the first defense strategy designed to counter such attacks. We formalize the defense as an optimization problem, aiming to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model. Since top-k ranking lists are non-differentiable, we transform them into swap matrices which are instead differentiable. These swap matrices serve as input to a student model that emulates the surrogate model's behavior. By back-propagating the loss of the student model, we obtain gradients for the swap matrices. These gradients are used to compute a swap loss, which maximizes the loss of the student model. We conducted experiments on three benchmark datasets to evaluate the performance of GRO, and the results demonstrate its superior effectiveness in defending against model extraction attacks.
Context-aware explainable recommendations over knowledge graphs
Knowledge graphs contain rich semantic relationships related to items and incorporating such semantic relationships into recommender systems helps to explore the latent connections of items, thus improving the accuracy of prediction and enhancing the explainability of recommendations. However, such explainability is not adapted to users' contexts, which can significantly influence their preferences. In this work, we propose CA-KGCN (Context-Aware Knowledge Graph Convolutional Network), an end-to-end framework that can model users' preferences adapted to their contexts and can incorporate rich semantic relationships in the knowledge graph related to items. This framework captures users' attention to different factors: contexts and features of items. More specifically, the framework can model users' preferences adapted to their contexts and provide explanations adapted to the given context. Experiments on three real-world datasets show the effectiveness of our framework: modeling users' preferences adapted to their contexts and explaining the recommendations generated.
Topology-aware Debiased Self-supervised Graph Learning for Recommendation
Han, Lei, Yan, Hui, Qiao, Zhicheng
In recommendation, graph-based Collaborative Filtering (CF) methods mitigate the data sparsity by introducing Graph Contrastive Learning (GCL). However, the random negative sampling strategy in these GCL-based CF models neglects the semantic structure of users (items), which not only introduces false negatives (negatives that are similar to anchor user (item)) but also ignores the potential positive samples. To tackle the above issues, we propose Topology-aware Debiased Self-supervised Graph Learning (TDSGL) for recommendation, which constructs contrastive pairs according to the semantic similarity between users (items). Specifically, since the original user-item interaction data commendably reflects the purchasing intent of users and certain characteristics of items, we calculate the semantic similarity between users (items) on interaction data. Then, given a user (item), we construct its negative pairs by selecting users (items) which embed different semantic structures to ensure the semantic difference between the given user (item) and its negatives. Moreover, for a user (item), we design a feature extraction module that converts other semantically similar users (items) into an auxiliary positive sample to acquire a more informative representation. Experimental results show that the proposed model outperforms the state-of-the-art models significantly on three public datasets. Our model implementation codes are available at https://github.com/malajikuai/TDSGL.
Memory efficient location recommendation through proximity-aware representation
Luo, Xuan, Huang, Mingqing, Lv, Rui, Zhao, Hui
Sequential location recommendation plays a huge role in modern life, which can enhance user experience, bring more profit to businesses and assist in government administration. Although methods for location recommendation have evolved significantly thanks to the development of recommendation systems, there is still limited utilization of geographic information, along with the ongoing challenge of addressing data sparsity. In response, we introduce a Proximity-aware based region representation for Sequential Recommendation (PASR for short), built upon the Self-Attention Network architecture. We tackle the sparsity issue through a novel loss function employing importance sampling, which emphasizes informative negative samples during optimization. Moreover, PASR enhances the integration of geographic information by employing a self-attention-based geography encoder to the hierarchical grid and proximity grid at each GPS point. To further leverage geographic information, we utilize the proximity-aware negative samplers to enhance the quality of negative samples. We conducted evaluations using three real-world Location-Based Social Networking (LBSN) datasets, demonstrating that PASR surpasses state-of-the-art sequential location recommendation methods