Recommender systems are widely used to predict personalized preferences of goods or services using users' past activities, such as item ratings or purchase histories. If collections of such personal activities were made publicly available, they could be used to personalize a diverse range of services, including targeted advertisement or recommendations. However, there would be an accompanying risk of privacy violations. The pioneering work of Narayanan et al.\ demonstrated that even if the identifiers are eliminated, the public release of user ratings can allow for the identification of users by those who have only a small amount of data on the users' past ratings. In this paper, we assume the following setting. A collector collects user ratings, then anonymizes and distributes them. A recommender constructs a recommender system based on the anonymized ratings provided by the collector. Based on this setting, we exhaustively list the models of recommender systems that use anonymized ratings. For each model, we then present an item-based collaborative filtering algorithm for making recommendations based on anonymized ratings. Our experimental results show that an item-based collaborative filtering based on anonymized ratings can perform better than collaborative filterings based on 5--10 non-anonymized ratings. This surprising result indicates that, in some settings, privacy protection does not necessarily reduce the usefulness of recommendations. From the experimental analysis of this counterintuitive result, we observed that the sparsity of the ratings can be reduced by anonymization and the variance of the prediction can be reduced if $k$, the anonymization parameter, is appropriately tuned. In this way, the predictive performance of recommendations based on anonymized ratings can be improved in some settings.
Collaborative filtering, a widely-used user-centric recommendation technique, predicts an item’s rating by aggregating its ratings from similar users. User similarity is usually calculated by cosine similarity or Pearson correlation coefficient. However, both of them consider only the direction of rating vectors, and suffer from a range of drawbacks. To solve these issues, we propose a novel Bayesian similarity measure based on the Dirichlet distribution, taking into consideration both the direction and length of rating vectors. Further, our principled method reduces correlation due to chance. Experimental results on six real-world data sets show that our method achieves superior accuracy.
Online rating systems are now ubiquitous due to the success of recommender systems. In such systems, users are allowed to rate the items (movies, songs, commodities) in a predefined range of values. The ratings collected can be used to infer users' preferences as well as items' intrinsic features, which are then matched to perform personalized recommendation. Most previous work focuses on improving the prediction accuracy or ranking capability. Little attention has been paid to the problem of spammers or low-reputed users in such systems.
This is a technical deep dive of the collaborative filtering algorithm and how to use it in practice. From Amazon recommending products you may be interested in based on your recent purchases to Netflix recommending shows and movies you may want to watch, recommender systems have become popular across many applications of data science. Like many other problems in data science, there are several ways to approach recommendations. Two of the most popular are collaborative filtering and content-based recommendations. Content-based Recommendations: If companies have detailed metadata about each of your items, they can recommend items with similar metadata tags.