Personal Assistant Systems
Scalable Recommendation with Poisson Factorization
Gopalan, Prem, Hofman, Jake M., Blei, David M.
We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factorization implicitly models each user's limited attention to consume items. Moreover, because of the mathematical form of the Poisson likelihood, the model needs only to explicitly consider the observed entries in the matrix, leading to both scalable computation and good predictive performance. We develop a variational inference algorithm for approximate posterior inference that scales up to massive data sets. This is an efficient algorithm that iterates over the observed entries and adjusts an approximate posterior over the user/item representations. We apply our method to large real-world user data containing users rating movies, users listening to songs, and users reading scientific papers. In all these settings, Bayesian Poisson factorization outperforms state-of-the-art matrix factorization methods.
Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques
Nicol, Olivier, Mary, Jรฉrรฉmie, Preux, Philippe
In many recommendation applications such as news recommendation, the items that can be rec- ommended come and go at a very fast pace. This is a challenge for recommender systems (RS) to face this setting. Online learning algorithms seem to be the most straight forward solution. The contextual bandit framework was introduced for that very purpose. In general the evaluation of a RS is a critical issue. Live evaluation is of- ten avoided due to the potential loss of revenue, hence the need for offline evaluation methods. Two options are available. Model based meth- ods are biased by nature and are thus difficult to trust when used alone. Data driven methods are therefore what we consider here. Evaluat- ing online learning algorithms with past data is not simple but some methods exist in the litera- ture. Nonetheless their accuracy is not satisfac- tory mainly due to their mechanism of data re- jection that only allow the exploitation of a small fraction of the data. We precisely address this issue in this paper. After highlighting the limita- tions of the previous methods, we present a new method, based on bootstrapping techniques. This new method comes with two important improve- ments: it is much more accurate and it provides a measure of quality of its estimation. The latter is a highly desirable property in order to minimize the risks entailed by putting online a RS for the first time. We provide both theoretical and ex- perimental proofs of its superiority compared to state-of-the-art methods, as well as an analysis of the convergence of the measure of quality.
Assessing Impacts of a Power User Attack on a Matrix Factorization Collaborative Recommender System
Seminario, Carlos E. (University of North Carolina at Charlotte) | Wilson, David C. (University of North Carolina at Charlotte)
Collaborative Filtering (CF) Recommender Systems (RSs) help users deal with the information overload they face when browsing, searching, or shopping for products and services. Power users are those individuals that are able to exert substantial influence over the recommendations made to other users, and RS operators encourage the existence of power user communities and leverage them to help fellow users make informed purchase decisions, especially on new items. Attacks on RSs occur when malicious users attempt to bias recommendations by introducing fake reviews or ratings; these attacks remain a key problem area for system operators. Thus, the influence wielded by power users can be used for both positive (addressing the "new item" problem) or negative (attack) purposes. Our research is investigating the impact on RS predictions and top-N recommendation lists when attackers emulate power users to provide biased ratings for new items. Previously we showed that power user attacks are effective against user-based CF RSs and that item-based CF RSs are robust to this type of attack. This paper presents the next stage in our investigation: (1) an evaluation of heuristic approaches to power user selection, and (2) evaluation of power user attacks in the context of matrix-factorization (SVD) based recommenders. Results show that social measures of influence such as degree centrality are more effective for selection of power users, and that matrix-factorization approaches are susceptible to power user attacks.
The Use of Paraphrase Identification in the Retrieval of Appropriate Responses for Script Based Conversational Agents
McClendon, Jerome L. (Clemson University) | Mack, Naja A. (Clemson University) | Hodges, Larry F. (Clemson University)
This paper presents an approach to creating intelligent conversational agents that are capable of returning appropriate responses to natural language input. Our approach consists of using a supervised learning algorithm in combination with different NLP algorithms in training the system to identify paraphrases of the userโs question stored in a database. When tested on a data set consisting of questions and answers for a current conversational agent project, our approach returned an accuracy score of 79.15%, a precision score of 77.58%and a recall score of 78.01%.
Differential Neighborhood Selection In Memory-Based Group Recommender Systems
Najjar, Nadia A (University of North Carolina at Charlotte) | Wilson, David C (University of North Carolina at Charlotte)
As recommender systems have become commonplace to support individual decision making, a need has also been recognized for systems that tailor and provide recommendations to a group of users together rather than individuals alone. Group recommender research to date has focused on evaluating strategies for aggregating profiles of group members to form a consolidated group profile or for aggregating recommendations to individual group members as a consolidated group recommendation list.ย This paper presents a novel neighborhood selection approach for group recommendation in the context of a neighborhood-based Collaborative Filtering system. We evaluate the performance of this approach with respect to group characteristics such as size and group member similarity. Results show that this approach can result in more accurate predictions for the group, particularly for groups that are more homogenous.
A Constrained Matrix-Variate Gaussian Process for Transposable Data
Koyejo, Oluwasanmi, Lee, Cheng, Ghosh, Joydeep
Transposable data represents interactions among two sets of entities, and are typically represented as a matrix containing the known interaction values. Additional side information may consist of feature vectors specific to entities corresponding to the rows and/or columns of such a matrix. Further information may also be available in the form of interactions or hierarchies among entities along the same mode (axis). We propose a novel approach for modeling transposable data with missing interactions given additional side information. The interactions are modeled as noisy observations from a latent noise free matrix generated from a matrix-variate Gaussian process. The construction of row and column covariances using side information provides a flexible mechanism for specifying a-priori knowledge of the row and column correlations in the data. Further, the use of such a prior combined with the side information enables predictions for new rows and columns not observed in the training data. In this work, we combine the matrix-variate Gaussian process model with low rank constraints. The constrained Gaussian process approach is applied to the prediction of hidden associations between genes and diseases using a small set of observed associations as well as prior covariances induced by gene-gene interaction networks and disease ontologies. The proposed approach is also applied to recommender systems data which involves predicting the item ratings of users using known associations as well as prior covariances induced by social networks. We present experimental results that highlight the performance of constrained matrix-variate Gaussian process as compared to state of the art approaches in each domain.
Personalized Recommendation of Twitter Lists using Content and Network Information
Rakesh, Vineeth (Wayne State University) | Singh, Dilpreet (Wayne State University) | Vinzamuri, Bhanukiran (Wayne State University) | Reddy, Chandan K (Wayne State University)
Lists in social networks have become popular tools to orga-nize content. This paper proposes a novel framework for rec-ommending lists to users by combining several features thatjointly capture their personal interests. Our contribution is oftwo-fold. First, we develop a ListRec model that leveragesthe dynamically varying tweet content, the network of twitterers and the popularity of lists to collectively model the usersโpreference towards social lists. Second, we use the topicalinterests of users, and the list network structure to developa novel network-based model called the LIST-PAGERANK.We use this model to recommend auxiliary lists that are morepopular than the lists that are currently subscribed by theusers. We evaluate our ListRec model using the Twitterdataset consisting of 2988 direct list subscriptions. Using au-tomatic evaluation technique, we compare the performanceof the ListRec model with different baseline methods andother competing approaches and show that our model deliversbetter precision in terms of the prediction of the subscribedlists of the twitterers. Furthermore, we also demonstrate the importance of combining different weighting schemes andtheir effect on capturing usersโ interest towards Twitter lists.To evaluate the LIST-PAGERANK model, we employ a user-study based evaluation to show that the model is effective inrecommending auxiliary lists that are more authoritative thanthe lists subscribed by the users.
Predicting User Replying Behavior on a Large Online Dating Site
Xia, Peng (University of Massachusetts Lowell) | Jiang, Hua (Baihe.com) | Wang, Xiaodong (Baihe.com) | Chen, Cindy (University of Massachusetts Lowell) | Liu, Benyuan (University of Massachusetts Lowell)
Online dating sites have become popular platforms for people to look for potential romantic partners.ย Many online dating sites provide recommendations on compatible partners based on their proprietaryย matching algorithms. It is important that not only the recommended dates match the user's preferenceย or criteria, but also the recommended users are interested in the user and likely to reciprocate whenย contacted. The goal of this paper is to predict whether an initial contact message from a user will beย replied to by the receiver. The study is based on a large scale real-world dataset obtained from a majorย dating site in China with more than sixty million registered users. We formulate our reply predictionย as a link prediction problem of social networks and approach it using a machine learning framework.ย The availability of a large amount of user profile information and the bipartite nature of the datingย network present unique opportunities and challenges to the reply prediction problem. We extract user-basedย features from user profiles and graph-based features from the bipartite dating network, apply them in aย variety of classification algorithms, and compare the utility of the features and performance of theย classifiers. Our results show that the user-based and graph-based features result in similar performance,ย and can be used to effectively predict the reciprocal links. Only a small performance gain is achievedย when both feature sets are used. Among the five classifiers we considered, random forests method outperformsย the other four algorithms (naive Bayes, logistic regression, KNN, and SVM). Our methods and results canย provide valuable guidelines to the design and performance of recommendation engine for online dating sites.
Collaborative Filtering with Information-Rich and Information-Sparse Entities
Zhu, Kai, Wu, Rui, Ying, Lei, Srikant, R.
In this paper, we consider a popular model for collaborative filtering in recommender systems where some users of a website rate some items, such as movies, and the goal is to recover the ratings of some or all of the unrated items of each user. In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users). When users (or items) are clustered, our algorithm can recover the rating matrix with $\omega(MK \log M)$ noisy entries while $MK$ entries are necessary, where $K$ is the number of clusters and $M$ is the number of items. In the case of co-clustering, we prove that $K^2$ entries are necessary for recovering the rating matrix, and our algorithm achieves this lower bound within a logarithmic factor when $K$ is sufficiently large. We compare our algorithms with a well-known algorithms called alternating minimization (AM), and a similarity score-based algorithm known as the popularity-among-friends (PAF) algorithm by applying all three to the MovieLens and Netflix data sets. Our co-clustering algorithm and AM have similar overall error rates when recovering the rating matrix, both of which are lower than the error rate under PAF. But more importantly, the error rate of our co-clustering algorithm is significantly lower than AM and PAF in the scenarios of interest in recommender systems: when recommending a few items to each user or when recommending items to users who only rated a few items (these users are the majority of the total user population). The performance difference increases even more when noise is added to the datasets.
Distributed Online Learning in Social Recommender Systems
Tekin, Cem, Zhang, Simpson, van der Schaar, Mihaela
In this paper, we consider decentralized sequential decision making in distributed online recommender systems, where items are recommended to users based on their search query as well as their specific background including history of bought items, gender and age, all of which comprise the context information of the user. In contrast to centralized recommender systems, in which there is a single centralized seller who has access to the complete inventory of items as well as the complete record of sales and user information, in decentralized recommender systems each seller/learner only has access to the inventory of items and user information for its own products and not the products and user information of other sellers, but can get commission if it sells an item of another seller. Therefore the sellers must distributedly find out for an incoming user which items to recommend (from the set of own items or items of another seller), in order to maximize the revenue from own sales and commissions. We formulate this problem as a cooperative contextual bandit problem, analytically bound the performance of the sellers compared to the best recommendation strategy given the complete realization of user arrivals and the inventory of items, as well as the context-dependent purchase probabilities of each item, and verify our results via numerical examples on a distributed data set adapted based on Amazon data. We evaluate the dependence of the performance of a seller on the inventory of items the seller has, the number of connections it has with the other sellers, and the commissions which the seller gets by selling items of other sellers to its users.