Goto

Collaborating Authors

 Personal Assistant Systems


Predicting Privacy Preferences for Smart Devices as Norms

arXiv.org Artificial Intelligence

Smart devices, such as smart speakers, are becoming ubiquitous, and users expect these devices to act in accordance with their preferences. In particular, since these devices gather and manage personal data, users expect them to adhere to their privacy preferences. However, the current approach of gathering these preferences consists in asking the users directly, which usually triggers automatic responses failing to capture their true preferences. In response, in this paper we present a collaborative filtering approach to predict user preferences as norms. These preference predictions can be readily adopted or can serve to assist users in determining their own preferences. Using a dataset of privacy preferences of smart assistant users, we test the accuracy of our predictions.


A Survey on User Behavior Modeling in Recommender Systems

arXiv.org Artificial Intelligence

User Behavior Modeling (UBM) plays a critical role in user interest learning, which has been extensively used in recommender systems. Crucial interactive patterns between users and items have been exploited, which brings compelling improvements in many recommendation tasks. In this paper, we attempt to provide a thorough survey of this research topic. We start by reviewing the research background of UBM. Then, we provide a systematic taxonomy of existing UBM research works, which can be categorized into four different directions including Conventional UBM, Long-Sequence UBM, Multi-Type UBM, and UBM with Side Information. Within each direction, representative models and their strengths and weaknesses are comprehensively discussed. Besides, we elaborate on the industrial practices of UBM methods with the hope of providing insights into the application value of existing UBM solutions. Finally, we summarize the survey and discuss the future prospects of this field.


Dual Policy Learning for Aggregation Optimization in Graph Neural Network-based Recommender Systems

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) provide powerful representations for recommendation tasks. GNN-based recommendation systems capture the complex high-order connectivity between users and items by aggregating information from distant neighbors and can improve the performance of recommender systems. Recently, Knowledge Graphs (KGs) have also been incorporated into the user-item interaction graph to provide more abundant contextual information; they are exploited to address cold-start problems and enable more explainable aggregation in GNN-based recommender systems (GNN-Rs). However, due to the heterogeneous nature of users and items, developing an effective aggregation strategy that works across multiple GNN-Rs, such as LightGCN and KGAT, remains a challenge. In this paper, we propose a novel reinforcement learning-based message passing framework for recommender systems, which we call DPAO (Dual Policy framework for Aggregation Optimization). This framework adaptively determines high-order connectivity to aggregate users and items using dual policy learning. Dual policy learning leverages two Deep-Q-Network models to exploit the user- and item-aware feedback from a GNN-R and boost the performance of the target GNN-R. Our proposed framework was evaluated with both non-KG-based and KG-based GNN-R models on six real-world datasets, and their results show that our proposed framework significantly enhances the recent base model, improving nDCG and Recall by up to 63.7% and 42.9%, respectively. Our implementation code is available at https://github.com/steve30572/DPAO/.


MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

arXiv.org Artificial Intelligence

Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user preferences over large bodies of contents. The reliance on a fixed embedding representation of embedding tables not only imposes significant memory capacity and bandwidth requirements but also limits the scope of compatible system solutions. This paper challenges the assumption of fixed embedding representations by showing how synergies between embedding representations and hardware platforms can lead to improvements in both algorithmic- and system performance. Based on our characterization of various embedding representations, we propose a hybrid embedding representation that achieves higher quality embeddings at the cost of increased memory and compute requirements. To address the system performance challenges of the hybrid representation, we propose MP-Rec -- a co-design technique that exploits heterogeneity and dynamic selection of embedding representations and underlying hardware platforms. On real system hardware, we demonstrate how matching custom accelerators, i.e., GPUs, TPUs, and IPUs, with compatible embedding representations can lead to 16.65x performance speedup. Additionally, in query-serving scenarios, MP-Rec achieves 2.49x and 3.76x higher correct prediction throughput and 0.19% and 0.22% better model quality on a CPU-GPU system for the Kaggle and Terabyte datasets, respectively.


Improving Recommendation Fairness via Data Augmentation

arXiv.org Artificial Intelligence

Collaborative filtering based recommendation learns users' preferences from all users' historical behavior data, and has been popular to facilitate decision making. R Recently, the fairness issue of recommendation has become more and more essential. A recommender system is considered unfair when it does not perform equally well for different user groups according to users' sensitive attributes~(e.g., gender, race). Plenty of methods have been proposed to alleviate unfairness by optimizing a predefined fairness goal or changing the distribution of unbalanced training data. However, they either suffered from the specific fairness optimization metrics or relied on redesigning the current recommendation architecture. In this paper, we study how to improve recommendation fairness from the data augmentation perspective. The recommendation model amplifies the inherent unfairness of imbalanced training data. We augment imbalanced training data towards balanced data distribution to improve fairness. The proposed framework is generally applicable to any embedding-based recommendation, and does not need to pre-define a fairness metric. Extensive experiments on two real-world datasets clearly demonstrate the superiority of our proposed framework. We publish the source code at https://github.com/newlei/FDA.


Real-World Deployment and Evaluation of Kwame for Science, An AI Teaching Assistant for Science Education in West Africa

arXiv.org Artificial Intelligence

Africa has a high student-to-teacher ratio which limits students' access to teachers for learning support such as educational question answering. In this work, we extended Kwame, our previous AI teaching assistant for coding education, adapted it for science education, and deployed it as a web app. Kwame for Science provides passages from well-curated knowledge sources and related past national exam questions as answers to questions from students based on the Integrated Science subject of the West African Senior Secondary Certificate Examination (WASSCE). Furthermore, students can view past national exam questions along with their answers and filter by year, question type (objectives, theory, and practicals), and topics that were automatically categorized by a topic detection model which we developed (91% unweighted average recall). We deployed Kwame for Science in the real world over 8 months and had 750 users across 32 countries (15 in Africa) and 1.5K questions asked. Our evaluation showed an 87.2% top 3 accuracy (n=109 questions) implying that Kwame for Science has a high chance of giving at least one useful answer among the 3 displayed. We categorized the reasons the model incorrectly answered questions to provide insights for future improvements. We also share challenges and lessons with the development, deployment, and human-computer interaction component of such a tool to enable other researchers to deploy similar tools. With a first-of-its-kind tool within the African context, Kwame for Science has the potential to enable the delivery of scalable, cost-effective, and quality remote education to millions of people across Africa.


Managing multi-facet bias in collaborative filtering recommender systems

arXiv.org Artificial Intelligence

Due to the extensive growth of information available online, recommender systems play a more significant role in serving people's interests. Traditional recommender systems mostly use an accuracy-focused approach to produce recommendations. Today's research suggests that this single-dimension approach can lead the system to be biased against a series of items with certain attributes. Biased recommendations across groups of items can endanger the interests of item providers along with causing user dissatisfaction with the system. This study aims to manage a new type of intersectional bias regarding the geographical origin and popularity of items in the output of state-of-the-art collaborative filtering recommender algorithms. We introduce an algorithm called MFAIR, a multi-facet post-processing bias mitigation algorithm to alleviate these biases. Extensive experiments on two real-world datasets of movies and books, enriched with the items' continents of production, show that the proposed algorithm strikes a reasonable balance between accuracy and both types of the mentioned biases. According to the results, our proposed approach outperforms a well-known competitor with no or only a slight loss of efficiency.


Entire Space Counterfactual Learning: Tuning, Analytical Properties and Industrial Applications

arXiv.org Artificial Intelligence

As a basic research problem for building effective recommender systems, post-click conversion rate (CVR) estimation has long been plagued by sample selection bias and data sparsity issues. To address the data sparsity issue, prevalent methods based on entire space multi-task model leverage the sequential pattern of user actions, i.e. exposure $\rightarrow$ click $\rightarrow$ conversion to construct auxiliary learning tasks. However, they still fall short of guaranteeing the unbiasedness of CVR estimates. This paper theoretically demonstrates two defects of these entire space multi-task models: (1) inherent estimation bias (IEB) for CVR estimation, where the CVR estimate is inherently higher than the ground truth; (2) potential independence priority (PIP) for CTCVR estimation, where the causality from click to conversion might be overlooked. This paper further proposes a principled method named entire space counterfactual multi-task model (ESCM$^2$), which employs a counterfactual risk minimizer to handle both IEB and PIP issues at once. To demonstrate the effectiveness of the proposed method, this paper explores its parameter tuning in practice, derives its analytic properties, and showcases its effectiveness in industrial CVR estimation, where ESCM$^2$ can effectively alleviate the intrinsic IEB and PIP issues and outperform baseline models.


Collaboration-Aware Graph Convolutional Network for Recommender Systems

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have been successfully adopted in recommender systems by virtue of the message-passing that implicitly captures collaborative effect. Nevertheless, most of the existing message-passing mechanisms for recommendation are directly inherited from GNNs without scrutinizing whether the captured collaborative effect would benefit the prediction of user preferences. In this paper, we first analyze how message-passing captures the collaborative effect and propose a recommendation-oriented topological metric, Common Interacted Ratio (CIR), which measures the level of interaction between a specific neighbor of a node with the rest of its neighbors. After demonstrating the benefits of leveraging collaborations from neighbors with higher CIR, we propose a recommendation-tailored GNN, Collaboration-Aware Graph Convolutional Network (CAGCN), that goes beyond 1-Weisfeiler-Lehman(1-WL) test in distinguishing non-bipartite-subgraph-isomorphic graphs. Experiments on six benchmark datasets show that the best CAGCN variant outperforms the most representative GNN-based recommendation model, LightGCN, by nearly 10% in Recall@20 and also achieves around 80% speedup. Our code is publicly available at https://github.com/YuWVandy/CAGCN.


Speech Privacy Leakage from Shared Gradients in Distributed Learning

arXiv.org Artificial Intelligence

Distributed machine learning paradigms, such as federated learning, have been recently adopted in many privacy-critical applications for speech analysis. However, such frameworks are vulnerable to privacy leakage attacks from shared gradients. Despite extensive efforts in the image domain, the exploration of speech privacy leakage from gradients is quite limited. In this paper, we explore methods for recovering private speech/speaker information from the shared gradients in distributed learning settings. We conduct experiments on a keyword spotting model with two different types of speech features to quantify the amount of leaked information by measuring the similarity between the original and recovered speech signals. We further demonstrate the feasibility of inferring various levels of side-channel information, including speech content and speaker identity, under the distributed learning framework without accessing the user's data.