Personal Assistant Systems
CAM2: Conformity-Aware Multi-Task Ranking Model for Large-Scale Recommender Systems
Raul, Ameya, Dharwadker, Amey Porobo, Schumitsch, Brad
Learning large-scale industrial recommender system models by fitting them to historical user interaction data makes them vulnerable to conformity bias. This may be due to a number of factors, including the fact that user interests may be difficult to determine and that many items are often interacted with based on ecosystem factors other than their relevance to the individual user. In this work, we introduce CAM2, a conformity-aware multi-task ranking model to serve relevant items to users on one of the largest industrial recommendation platforms. CAM2 addresses these challenges systematically by leveraging causal modeling to disentangle users' conformity to popular items from their true interests. This framework is generalizable and can be scaled to support multiple representations of conformity and user relevance in any large-scale recommender system. We provide deeper practical insights and demonstrate the effectiveness of the proposed model through improvements in offline evaluation metrics compared to our production multi-task ranking model. We also show through online experiments that the CAM2 model results in a significant 0.50% increase in aggregated user engagement, coupled with a 0.21% increase in daily active users on Facebook Watch, a popular video discovery and sharing platform serving billions of users.
Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations
Li, Haoxuan, Xiao, Yanghao, Zheng, Chunyuan, Wu, Peng
Recommender systems are seen as an effective tool to address information overload, but it is widely known that the presence of various biases makes direct training on large-scale observational data result in sub-optimal prediction performance. In contrast, unbiased ratings obtained from randomized controlled trials or A/B tests are considered to be the golden standard, but are costly and small in scale in reality. To exploit both types of data, recent works proposed to use unbiased ratings to correct the parameters of the propensity or imputation models trained on the biased dataset. However, the existing methods fail to obtain accurate predictions in the presence of unobserved confounding or model misspecification. In this paper, we propose a theoretically guaranteed model-agnostic balancing approach that can be applied to any existing debiasing method with the aim of combating unobserved confounding and model misspecification. The proposed approach makes full use of unbiased data by alternatively correcting model parameters learned with biased data, and adaptively learning balance coefficients of biased samples for further debiasing. Extensive real-world experiments are conducted along with the deployment of our proposal on four representative debiasing methods to demonstrate the effectiveness.
Trust and Transparency in Recommender Systems
Siepmann, Clara, Chatti, Mohamed Amine
Trust is long recognized to be an important factor in Recommender Systems (RS). However, there are different perspectives on trust and different ways to evaluate it. Moreover, a link between trust and transparency is often assumed but not always further investigated. In this paper we first go through different understandings and measurements of trust in the AI and RS community, such as demonstrated and perceived trust. We then review the relationsships between trust and transparency, as well as mental models, and investigate different strategies to achieve transparency in RS such as explanation, exploration and exploranation (i.e., a combination of exploration and explanation). We identify a need for further studies to explore these concepts as well as the relationships between them.
Enhancing Personalized Ranking With Differentiable Group AUC Optimization
Sun, Xiao, Zhang, Bo, Zhang, Chenrui, Ren, Han, Cai, Mingchen
AUC is a common metric for evaluating the performance of a classifier. However, most classifiers are trained with cross entropy, and it does not optimize the AUC metric directly, which leaves a gap between the training and evaluation stage. In this paper, we propose the PDAOM loss, a Personalized and Differentiable AUC Optimization method with Maximum violation, which can be directly applied when training a binary classifier and optimized with gradient-based methods. Specifically, we construct the pairwise exponential loss with difficult pair of positive and negative samples within sub-batches grouped by user ID, aiming to guide the classifier to pay attention to the relation between hard-distinguished pairs of opposite samples from the perspective of independent users. Compared to the origin form of pairwise exponential loss, the proposed PDAOM loss not only improves the AUC and GAUC metrics in the offline evaluation, but also reduces the computation complexity of the training objective. Furthermore, online evaluation of the PDAOM loss on the 'Guess What You Like' feed recommendation application in Meituan manifests 1.40% increase in click count and 0.65% increase in order count compared to the baseline model, which is a significant improvement in this well-developed online life service recommendation system.
RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System
Wang, Kai, Zou, Zhene, Zhao, Minghao, Deng, Qilin, Shang, Yue, Liang, Yile, Wu, Runze, Shen, Xudong, Lyu, Tangjie, Fan, Changjie
Reinforcement learning based recommender systems (RL-based RS) aim at learning a good policy from a batch of collected data, by casting recommendations to multi-step decision-making tasks. However, current RL-based RS research commonly has a large reality gap. In this paper, we introduce the first open-source real-world dataset, RL4RS, hoping to replace the artificial datasets and semi-simulated RS datasets previous studies used due to the resource limitation of the RL-based RS domain. Unlike academic RL research, RL-based RS suffers from the difficulties of being well-validated before deployment. We attempt to propose a new systematic evaluation framework, including evaluation of environment simulation, evaluation on environments, counterfactual policy evaluation, and evaluation on environments built from test set. In summary, the RL4RS (Reinforcement Learning for Recommender Systems), a new resource with special concerns on the reality gaps, contains two real-world datasets, data understanding tools, tuned simulation environments, related advanced RL baselines, batch RL baselines, and counterfactual policy evaluation algorithms. The RL4RS suite can be found at https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender systems, we expect the resource to contribute to research in applied reinforcement learning.
Decentralized Gradient-Quantization Based Matrix Factorization for Fast Privacy-Preserving Point-of-Interest Recommendation
Zhou, Xuebin (South China University of Technology) | Hu, Zhibin (South China Normal University) | Huang, Jin (South China Normal University) | Chen, Jian (South China University of Technology)
With the rapidly growing of location-based social networks, point-of-interest (POI) recommendation has been attracting tremendous attentions. Previous works for POI recommendation usually use matrix factorization (MF)-based methods, which achieve promising performance. However, existing MF-based methods suffer from two critical limitations: (1) Privacy issues: all users’ sensitive data are collected to the centralized server which may leak on either the server side or during transmission. (2) Poor resource utilization and training efficiency: training on centralized server with potentially huge low-rank matrices is computational inefficient. In this paper, we propose a novel decentralized gradient-quantization based matrix factorization (DGMF) framework to address the above limitations in POI recommendation. Compared with the centralized MF methods which store all sensitive data and low-rank matrices during model training, DGMF treats each user’s device (e.g., phone) as an independent learner and keeps the sensitive data on each user’s end. Furthermore, a privacy-preserving and communication-efficient mechanism with gradient-quantization technique is presented to train the proposed model, which aims to handle the privacy problem and reduces the communication cost in the decentralized setting. Theoretical guarantees of the proposed algorithm and experimental studies on real-world datasets demonstrate the effectiveness of the proposed algorithm.
The Internet Thinks We Don't Know Its Secret. But I Do.
She had lived in a nursing home for 10 years, and communicated with her sister, and the world, through Alexa. Two days after Lou Ann died of complications from coronavirus, her sister found recordings of Lou Ann's voice asking Alexa, "How do I get help?" Maybe you are reading this in your bed on your phone wherever you are this morning. I was having what I thought of as a weak stretch in my life, when I didn't have a regular job, and when just deciding what I would do to avoid writing, or having a single thought about my email, was enough to short-circuit me and I would find myself still in pajamas at 5 p.m., pacing and crying, Googling What's wrong with me and waiting until it was OK to go to bed again. In such weak stretches, among the many indulgences I permit myself is the minor suboptimal habit of actually sleeping with my phone. Under the other pillow next to me, where no one sleeps. In other, more robust stretches, my phone spends the night plugged in about a foot away on the nightstand, and I can still reach it if I wake up and want to look at it, but it's tethered. When I let it sleep freely with me, I can turn over while I look at it. I can look at it while I'm lying on my left side, and then I can turn over and look at it while I'm lying on my right side. I just charge it the next day, because it doesn't matter if either of us is ready to go in the morning. On this particular morning I opened my eyes and looked at my phone in the bed next to me, and as I put my hand on it, I said, "I belong to you."
A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness
Feng, Tiantian, Hebbar, Rajat, Mehlman, Nicholas, Shi, Xuan, Kommineni, Aditya, Narayanan, and Shrikanth
ABSTRACT Speech-centric machine learning systems have revolutionized a number of leading industries ranging from transportation and healthcare to education and defense, fundamentally reshaping how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over privacy breaches, discriminating performance, and vulnerability to adversarial attacks have all been discovered in ML research fields. In order to address the above challenges and risks, a significant number of efforts have been made to ensure these ML systems are trustworthy, especially private, safe, and fair. In this paper, we conduct the first comprehensive survey on speech-centric trustworthy ML topics related to privacy, safety, and fairness. In addition to serving as a summary report for the research community, we highlight several promising future research directions to inspire researchers who wish to explore further in this area.
A Field Test of Bandit Algorithms for Recommendations: Understanding the Validity of Assumptions on Human Preferences in Multi-armed Bandits
Leqi, Liu, Zhou, Giulio, Kılınç-Karzan, Fatma, Lipton, Zachary C., Montgomery, Alan L.
Personalized recommender systems suffuse modern life, shaping what media we read and what products we consume. Algorithms powering such systems tend to consist of supervised learning-based heuristics, such as latent factor models with a variety of heuristically chosen prediction targets. Meanwhile, theoretical treatments of recommendation frequently address the decision-theoretic nature of the problem, including the need to balance exploration and exploitation, via the multi-armed bandits (MABs) framework. However, MAB-based approaches rely heavily on assumptions about human preferences. These preference assumptions are seldom tested using human subject studies, partly due to the lack of publicly available toolkits to conduct such studies. In this work, we conduct a study with crowdworkers in a comics recommendation MABs setting. Each arm represents a comic category, and users provide feedback after each recommendation. We check the validity of core MABs assumptions-that human preferences (reward distributions) are fixed over time-and find that they do not hold. This finding suggests that any MAB algorithm used for recommender systems should account for human preference dynamics. While answering these questions, we provide a flexible experimental framework for understanding human preference dynamics and testing MABs algorithms with human users. The code for our experimental framework and the collected data can be found at https://github.com/HumainLab/human-bandit-evaluation.
Robot assistants in the operating room promise safer surgery
Advanced robotics can help surgeons carry out procedures where there is little margin for error. In a surgery in India, a robot scans a patient's knee to figure out how best to carry out a joint replacement. Meanwhile, in an operating room in the Netherlands, another robot is performing highly challenging microsurgery under the control of a doctor using joysticks. Such scenarios look set to become more common. At present, some manual operations are so difficult they can be performed by only a small number of surgeons worldwide, while others are invasive and depend on a surgeon's specific skill.