Personal Assistant Systems
Extracting Features from Ratings: The Role of Factor Models
Selke, Joachim, Balke, Wolf-Tilo
Performing effective preference-based data retrieval requires detailed and preferentially meaningful structurized information about the current user as well as the items under consideration. A common problem is that representations of items often only consist of mere technical attributes, which do not resemble human perception. This is particularly true for integral items such as movies or songs. It is often claimed that meaningful item features could be extracted from collaborative rating data, which is becoming available through social networking services. However, there is only anecdotal evidence supporting this claim; but if it is true, the extracted information could very valuable for preference-based data retrieval. In this paper, we propose a methodology to systematically check this common claim. We performed a preliminary investigation on a large collection of movie ratings and present initial evidence.
Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
Srebro, Nathan, Salakhutdinov, Ruslan R.
We show that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace-norm regularizer works well with non-uniform sampling. We show that the weighted trace-norm regularization indeed yields significant gains on the highly non-uniformly sampled Netflix dataset.
Improving Human Judgments by Decontaminating Sequential Dependencies
Mozer, Michael C., Pashler, Harold, Wilder, Matthew, Lindsey, Robert V., Jones, Matt, Jones, Michael N.
For over half a century, psychologists have been struck by how poor people are at expressing their internal sensations, impressions, and evaluations via rating scales. When individuals make judgments, they are incapable of using an absolute rating scale, and instead rely on reference points from recent experience. This relativity of judgment limits the usefulness of responses provided by individuals to surveys, questionnaires, and evaluation forms. Fortunately, the cognitive processes that transform internal states to responses are not simply noisy, but rather are influenced by recent experience in a lawful manner. We explore techniques to remove sequential dependencies, and thereby decontaminate a series of ratings to obtain more meaningful human judgments. In our formulation, decontamination is fundamentally a problem of inferring latent states (internal sensations) which, because of the relativity of judgment, have temporal dependencies. We propose a decontamination solution using a conditional random field with constraints motivated by psychological theories of relative judgment. Our exploration of decontamination models is supported by two experiments we conducted to obtain ground-truth rating data on a simple length estimation task. Our decontamination techniques yield an over 20% reduction in the error of human judgments.
A new Recommender system based on target tracking: a Kalman Filter approach
Nowakowski, Samuel, Bernier, Cédric, Boyer, Anne
In this paper, we propose a new approach for recommender systems based on target tracking by Kalman filtering. We assume that users and their seen resources are vectors in the multidimensional space of the categories of the resources. Knowing this space, we propose an algorithm based on a Kalman filter to track users and to predict the best prediction of their future position in the recommendation space.
Random Graph Generator for Bipartite Networks Modeling
Chojnacki, Szymon, Kłopotek, Mieczysław
The purpose of this article is to introduce a new iterative algorithm with properties resembling real life bipartite graphs. The algorithm enables us to generate wide range of random bigraphs, which features are determined by a set of parameters.We adapt the advances of last decade in unipartite complex networks modeling to the bigraph setting. This data structure can be observed in several situations. However, only a few datasets are freely available to test the algorithms (e.g. community detection, influential nodes identification, information retrieval) which operate on such data. Therefore, artificial datasets are needed to enhance development and testing of the algorithms. We are particularly interested in applying the generator to the analysis of recommender systems. Therefore, we focus on two characteristics that, besides simple statistics, are in our opinion responsible for the performance of neighborhood based collaborative filtering algorithms. The features are node degree distribution and local clustering coeficient.
Random Graphs for Performance Evaluation of Recommender Systems
Chojnacki, Szymon, Kłopotek, Mieczysław
The purpose of this article is to introduce a new analytical framework dedicated to measuring performance of recommender systems. The standard approach is to assess the quality of a system by means of accuracy related statistics. However, the specificity of the environments in which recommender systems are deployed requires to pay much attention to speed and memory requirements of the algorithms. Unfortunately, it is implausible to assess accurately the complexity of various algorithms with formal tools. This can be attributed to the fact that such analyses are usually based on an assumption of dense representation of underlying data structures. Whereas, in real life the algorithms operate on sparse data and are implemented with collections dedicated for them. Therefore, we propose to measure the complexity of recommender systems with artificial datasets that posses real-life properties. We utilize recently developed bipartite graph generator to evaluate how state-of-the-art recommender systems' behavior is determined and diversified by topological properties of the generated datasets.
Learning under Concept Drift: an Overview
Concept drift refers to a non stationary learning problem over time. The training and the application data often mismatch in real life problems. In this report we present a context of concept drift problem 1. We focus on the issues relevant to adaptive training set formation. We present the framework and terminology, and formulate a global picture of concept drift learners design. We start with formalizing the framework for the concept drifting data in Section 1. In Section 2 we discuss the adaptivity mechanisms of the concept drift learners. In Section 3 we overview the principle mechanisms of concept drift learners. In this chapter we give a general picture of the available algorithms and categorize them based on their properties. Section 5 discusses the related research fields and Section 5 groups and presents major concept drift applications. This report is intended to give a bird's view of concept drift research field, provide a context of the research and position it within broad spectrum of research fields and applications.
UserRec: A User Recommendation Framework in Social Tagging Systems
Zhou, Tom Chao (The Chinese University of Hong Kong) | Ma, Hao (The Chinese University of Hong Kong) | Lyu, Michael R. (The Chinese University of Hong Kong) | King, Irwin (The Chinese University of Hong Kong)
Social tagging systems have emerged as an effective way for users to annotate and share objects on the Web. However, with the growth of social tagging systems, users are easily overwhelmed by the large amount of data and it is very difficult for users to dig out information that he/she is interested in. Though the tagging system has provided interest-based social network features to enable the user to keep track of other users' tagging activities, there is still no automatic and effective way for the user to discover other users with common interests. In this paper, we propose a User Recommendation (UserRec) framework for user interest modeling and interest-based user recommendation, aiming to boost information sharing among users with similar interests. Our work brings three major contributions to the research community: (1) we propose a tag-graph based community detection method to model the users' personal interests, which are further represented by discrete topic distributions; (2) the similarity values between users' topic distributions are measured by Kullback-Leibler divergence (KL-divergence), and the similarity values are further used to perform interest-based user recommendation; and (3) by analyzing users' roles in a tagging system, we find users' roles in a tagging system are similar to Web pages in the Internet. Experiments on tagging dataset of Web pages (Yahoo!~Delicious) show that UserRec outperforms other state-of-the-art recommender system approaches.
Transfer Learning in Collaborative Filtering for Sparsity Reduction
Pan, Weike (Hong Kong University of Science and Technology) | Xiang, Evan Wei (Hong Kong University of Science and Technology) | Liu, Nathan Nan (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology)
Data sparsity is a major problem for collaborative filtering (CF) techniques in recommender systems, especially for new users and items. We observe that, while our target data are sparse for CF systems, related and relatively dense auxiliary data may already exist in some other more mature application domains. In this paper, we address the data sparsity problem in a target domain by transferring knowledge about both users and items from auxiliary data sources. We observe that in different domains the user feedbacks are often heterogeneous such as ratings vs. clicks. Our solution is to integrate both user and item knowledge in auxiliary data sources through a principled matrix-based transfer learning framework that takes into account the data heterogeneity. In particular, we discover the principle coordinates of both users and items in the auxiliary data matrices, and transfer them to the target domain in order to reduce the effect of data sparsity. We describe our method, which is known as coordinate system transfer or CST, and demonstrate its effectiveness in alleviating the data sparsity problem in collaborative filtering. We show that our proposed method can significantly outperform several state-of-the-art solutions for this problem.
Modeling Dynamic Multi-Topic Discussions in Online Forums
Wu, Hao (Zhejiang University) | Bu, Jiajun (Zhejiang University) | Chen, Chun (Zhejiang University) | Wang, Can (Zhejiang University) | Qiu, Guang (Zhejiang University) | Zhang, Lijun (Zhejiang University) | Shen, Jianfeng (Zhejiang Health Information Center)
In the form of topic discussions, users interact with each other to share knowledge and exchange information in online forums. Modeling the evolution of topic discussion reveals how information propagates on Internet and can thus help understand sociological phenomena and improve the performance of applications such as recommendation systems. In this paper, we argue that a user’s participation in topic discussions is motivated by either her friends or her own preferences. Inspired by the theory of information flow, we propose dynamic topic discussion models by mining influential relationships between users and individual preferences. Reply relations of users are exploited to construct the fundamental influential social network. The property of discussed topics and time lapse factor are also considered in our modeling. Furthermore, we propose a novel measure called ParticipationRank to rank users according to how important they are in the social network and to what extent they prefer to participate in the discussion of a certain topic. The experiments show our model can simulate the evolution of topic discussions well and predict the tendency of user’s participation accurately.