Goto

Collaborating Authors

 Personal Assistant Systems


CLSE: Corpus of Linguistically Significant Entities

arXiv.org Artificial Intelligence

One of the biggest challenges of natural language generation (NLG) is the proper handling of named entities. Named entities are a common source of grammar mistakes such as wrong prepositions, wrong article handling, or incorrect entity inflection. Without factoring linguistic representation, such errors are often underrepresented when evaluating on a small set of arbitrarily picked argument values, or when translating a dataset from a linguistically simpler language, like English, to a linguistically complex language, like Russian. However, for some applications, broadly precise grammatical correctness is critical -- native speakers may find entity-related grammar errors silly, jarring, or even offensive. To enable the creation of more linguistically diverse NLG datasets, we release a Corpus of Linguistically Significant Entities (CLSE) annotated by linguist experts. The corpus includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games. To demonstrate one possible use of CLSE, we produce an augmented version of the Schema-Guided Dialog Dataset, SGD-CLSE. Using the CLSE's entities and a small number of human translations, we create a linguistically representative NLG evaluation benchmark in three languages: French (high-resource), Marathi (low-resource), and Russian (highly inflected language). We establish quality baselines for neural, template-based, and hybrid NLG systems and discuss the strengths and weaknesses of each approach.


On the Consistency of Average Embeddings for Item Recommendation

arXiv.org Machine Learning

A prevalent practice in recommender systems consists in averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.


Multimodal Recommender Systems in the Prediction of Disease Comorbidity

arXiv.org Artificial Intelligence

While deep-learning based recommender systems utilizing collaborative filtering have been commonly used for recommendation in other domains, their application in the medical domain have been limited. In addition to modeling user-item interactions, we show that deep-learning based recommender systems can be used to model subject-disease code interactions. Two novel applications of deep learning-based recommender systems using Neural Collaborative Filtering (NCF) and Deep Hybrid Filtering (DHF) were utilized for disease diagnosis based on known past patient comorbidities. Two datasets, one incorporating all subject-disease code pairs present in the MIMIC-III database, and the other incorporating the top 50 most commonly occurring diseases, were used for prediction. Accuracy and Hit Ratio@10 were utilized as metrics to estimate model performance. The performance of the NCF model making use of the reduced "top 50" ICD-9 code dataset was found to be lower (accuracy of ~80% and hit ratio@10 of 35%) as compared to the performance of the NCF model trained on all ICD-9 codes (accuracy of ~90% and hit ratio@10 of ~80%). Reasons for the superior performance of the sparser dataset with all ICD codes can be mainly attributed to the higher volume of data and the robustness of deep-learning based recommender systems with modeling sparse data. Additionally, results from the DHF models reflect better performance than the NCF models, with a better accuracy of 94.4% and hit ratio@10 of 85.36%, reflecting the importance of the incorporation of clinical note information. Additionally, compared to literature reports utilizing primarily natural language processing-based predictions for the task of ICD-9 code co-occurrence, the novel deep learning-based recommender systems approach performed better. Overall, the deep learning-based recommender systems have shown promise in predicting disease comorbidity.


Ensuring User-side Fairness in Dynamic Recommender Systems

arXiv.org Artificial Intelligence

User-side group fairness is crucial for modern recommender systems, as it aims to alleviate performance disparity between groups of users defined by sensitive attributes such as gender, race, or age. We find that the disparity tends to persist or even increase over time. This calls for effective ways to address user-side fairness in a dynamic environment, which has been infrequently explored in the literature. However, fairness-constrained re-ranking, a typical method to ensure user-side fairness (i.e., reducing performance disparity), faces two fundamental challenges in the dynamic setting: (1) non-differentiability of the ranking-based fairness constraint, which hinders the end-to-end training paradigm, and (2) time-inefficiency, which impedes quick adaptation to changes in user preferences. In this paper, we propose FAir Dynamic rEcommender (FADE), an end-to-end framework with fine-tuning strategy to dynamically alleviate performance disparity. To tackle the above challenges, FADE uses a novel fairness loss designed to be differentiable and lightweight to fine-tune model parameters to ensure both user-side fairness and high-quality recommendations. Via extensive experiments on the real-world dataset, we empirically demonstrate that FADE effectively and efficiently reduces performance disparity, and furthermore, FADE improves overall recommendation quality over time compared to not using any new data.


RecXplainer: Amortized Attribute-based Personalized Explanations for Recommender Systems

arXiv.org Artificial Intelligence

Recommender systems influence many of our interactions in the digital world -- impacting how we shop for clothes, sorting what we see when browsing YouTube or TikTok, and determining which restaurants and hotels we are shown when using hospitality platforms. Modern recommender systems are large, opaque models trained on a mixture of proprietary and open-source datasets. Naturally, issues of trust arise on both the developer and user side: is the system working correctly, and why did a user receive (or not receive) a particular recommendation? Providing an explanation alongside a recommendation alleviates some of these concerns. The status quo for auxiliary recommender system feedback is either user-specific explanations (e.g., "users who bought item B also bought item A") or item-specific explanations (e.g., "we are recommending item A because you watched/bought item B"). However, users bring personalized context into their search experience, valuing an item as a function of that item's attributes and their own personal preferences. In this work, we propose RecXplainer, a novel method for generating fine-grained explanations based on a user's preferences over the attributes of recommended items. We evaluate RecXplainer on five real-world and large-scale recommendation datasets using five different kinds of recommender systems to demonstrate the efficacy of RecXplainer in capturing users' preferences over item attributes and using them to explain recommendations. We also compare RecXplainer to five baselines and show RecXplainer's exceptional performance on ten metrics.


Amazon's Echo Show sale takes up to 42 percent off smart displays

Engadget

If you've ever considered picking up an Amazon Echo Show but weren't sure about the price or which one, now's your chance. A range of Echo Shows are currently on sale, including the new third-generation Echo Show 5, down from $90 to $65, a 28 percent discount. The deal is available in Charcoal, Glacier White or Cloud Blue. However, for the same price, you can get the Echo Show 5 and a Sengled Matter Smart Bulb that you can control with your voice or the Alexa app. It's typically $110 for the bundle, so this option gives you 40 percent off.


Group Equality in Adaptive Submodular Maximization

arXiv.org Artificial Intelligence

In this paper, we study the classic submodular maximization problem subject to a group equality constraint under both non-adaptive and adaptive settings. It has been shown that the utility function of many machine learning applications, including data summarization, influence maximization in social networks, and personalized recommendation, satisfies the property of submodularity. Hence, maximizing a submodular function subject to various constraints can be found at the heart of many of those applications. On a high level, submodular maximization aims to select a group of most representative items (e.g., data points). However, the design of most existing algorithms does not incorporate the fairness constraint, leading to under- or over-representation of some particular groups. This motivates us to study the submodular maximization problem with group equality, where we aim to select a group of items to maximize a (possibly non-monotone) submodular utility function subject to a group equality constraint. To this end, we develop the first constant-factor approximation algorithm for this problem. The design of our algorithm is robust enough to be extended to solving the submodular maximization problem under a more complicated adaptive setting. Moreover, we further extend our study to incorporating a global cardinality constraint and other fairness notations.


Continuous-Time User Preference Modelling for Temporal Sets Prediction

arXiv.org Artificial Intelligence

Given a sequence of sets, where each set has a timestamp and contains an arbitrary number of elements, temporal sets prediction aims to predict the elements in the subsequent set. Previous studies for temporal sets prediction mainly focus on the modelling of elements and implicitly represent each user's preference based on his/her interacted elements. However, user preferences are often continuously evolving and the evolutionary trend cannot be fully captured with the indirect learning paradigm of user preferences. To this end, we propose a continuous-time user preference modelling framework for temporal sets prediction, which explicitly models the evolving preference of each user by maintaining a memory bank to store the states of all the users and elements. Specifically, we first construct a universal sequence by arranging all the user-set interactions in a non-descending temporal order, and then chronologically learn from each user-set interaction. For each interaction, we continuously update the memories of the related user and elements based on their currently encoded messages and past memories. Moreover, we present a personalized user behavior learning module to discover user-specific characteristics based on each user's historical sequence, which aggregates the previously interacted elements from dual perspectives according to the user and elements. Finally, we develop a set-batch algorithm to improve the model efficiency, which can create time-consistent batches in advance and achieve 3.5x and 3.0x speedups in the training and evaluation process on average. Experiments on four real-world datasets demonstrate the superiority of our approach over state-of-the-arts under both transductive and inductive settings. The good interpretability of our method is also shown.


I'm a privacy expert, here's how to stop your phone from listening and spying on you right now

Daily Mail - Science & tech

From where you go to what you say to Siri and Google Assistant, most smartphone apps collect your data continuously. Companies then sell this data to advertising companies, hence why it can sometimes feel like you are recommended ads about products you mentioned in passing once. Data privacy advocate Gaël Duval said that, thankfully, it's possible to change settings so this doesn't happen. Murena believes this has measurable benefits: he says that poor data privacy and personalised adverts directly contribute to increased time spent online, impulse buying and even worsening mental health problems – as tech companies understand more about you, they will target adverts at you more precisely. Research by TASO in 2022 found that 79 percent of people were worried about online technology companies using their data, and 65 percent felt uncomfortable sharing their data to use services for free.


Text Matching Improves Sequential Recommendation by Reducing Popularity Biases

arXiv.org Artificial Intelligence

This paper proposes Text mAtching based SequenTial rEcommendation model (TASTE), which maps items and users in an embedding space and recommends items by matching their text representations. TASTE verbalizes items and user-item interactions using identifiers and attributes of items. To better characterize user behaviors, TASTE additionally proposes an attention sparsity method, which enables TASTE to model longer user-item interactions by reducing the self-attention computations during encoding. Our experiments show that TASTE outperforms the state-of-the-art methods on widely used sequential recommendation datasets. TASTE alleviates the cold start problem by representing long-tail items using full-text modeling and bringing the benefits of pretrained language models to recommendation systems. Our further analyses illustrate that TASTE significantly improves the recommendation accuracy by reducing the popularity bias of previous item id based recommendation models and returning more appropriate and text-relevant items to satisfy users. All codes are available at https://github.com/OpenMatch/TASTE.