Country
Leveraging Unlabeled Data to Scale Blocking for Record Linkage
Cao, Yunbo (Microsoft Research Asia) | Chen, Zhiyuan (Dalian University of Technology) | Zhu, Jiamin (Shanghai Jiao Tong University) | Yue, Pei (Microsoft Corporation) | Lin, Chin-Yew (Microsoft Research Asia) | Yu, Yong (Shanghai Jiao Tong University)
Record linkage is the process of matching records between two (or multiple) data sets that represent the same real-world entity. An exhaustive record linkage process involves computing the similarities between all pairs of records, which can be very expensive for large data sets. Blocking techniques alleviate this problem by dividing the records into blocks and only comparing records within the same block. To be adaptive from domain to domain, one category of blocking technique formalizes 'construction of blocking scheme' as a machine learning problem. In the process of learning the best blocking scheme, previous learning-based techniques utilize only a set of labeled data. However, since the set of labeled data is usually not large enough to well characterize the unseen (unlabeled) data, the resultant blocking scheme may poorly perform on the unseen data by generating too many candidate matches. To address that, in this paper, we propose to utilize unlabeled data (in addition to labeled data) for learning blocking schemes. Our experimental results show that using unlabeled data in learning can remarkably reduce the number of candidate matches while keeping the same level of coverage for true matches.
CCR — A Content-Collaborative Reciprocal Recommender for Online Dating
Akehurst, Joshua (University of Sydney) | Koprinska, Irena (University of Sydney) | Yacef, Kalina (University of Sydney) | Pizzato, Luiz (University of Sydney) | Kay, Judy (University of Sydney) | Rej, Tomasz (University of Sydney)
We present a new recommender system for online dating. Using a large dataset from a major online dating website, we first show that similar people, as defined by a set of personal attributes, like and dislike similar people and are liked and disliked by similar people. This analysis provides the foundation for our Content-Collaborative Reciprocal (CCR) recommender approach. The content-based part uses selected user profile features and similarity measure to generate a set of similar users. The collaborative filtering part uses the interactions of the similar users, including the people they like/dislike and are liked/disliked by, to produce reciprocal recommendations. CCR addresses the cold start problem of new users joining the site by being able to provide recommendations immediately, based on their profiles. Evaluation results show that the success rate of the recommendations is 69.26% compared with a baseline of 35.19% for the top 10 ranked recommendations.
Bayesian Chain Classifiers for Multidimensional Classification
Zaragoza, Julio Cesar (INAOE) | Sucar, Enrique (INAOE) | Morales, Eduardo (INAOE) | Bielza, Concha (Universidad Politécnica Madrid) | Larrañaga, Pedro (Universidad Politécnica Madrid)
In multidimensional classification the goal is to assign an instance to a set of different classes. This task is normally addressed either by defining a compound class variable with all the possible combinations of classes (label power-set methods, LPMs) or by building independent classifiers for each class (binary-relevance methods, BRMs). However, LPMs do not scale well and BRMs ignore the dependency relations between classes. We introduce a method for chaining binary Bayesian classifiers that combines the strengths of classifier chains and Bayesian networks for multidimensional classification. The method consists of two phases. In the first phase, a Bayesian network (BN) that represents the dependency relations between the class variables is learned from data. In the second phase, several chain classifiers are built, such that the order of the class variables in the chain is consistent with the class BN. At the end we combine the results of the different generated orders. Our method considers the dependencies between class variables and takes advantage of the conditional independence relations to build simplified models. We perform experiments with a chain of naive Bayes classifiers on different benchmark multidimensional datasets and show that our approach outperforms other state-of-the-art methods.
Learning Optimal Bayesian Networks Using A* Search
Yuan, Changhe (Mississippi State University) | Malone, Brandon (Mississippi State University) | Wu, Xiaojian (University of Massachusetts)
This paper formulates learning optimal Bayesian network as a shortest path finding problem. An A* search algorithm is introduced to solve the problem. With the guidance of a consistent heuristic, the algorithm learns an optimal Bayesian networkby only searching the most promising parts of the solution space. Empirical results show that the A*search algorithm significantly improves the time and space efficiency of existing methods on a set of benchmark datasets.
Finding (α,ϑ)-Solutions Via Sampled SCSPs
Rossi, Roberto (Wageningen University) | Hnich, Brahim (Izmir University of Economics) | Tarim, S. Armagan ( Hacettepe University ) | Prestwich, Steven (University College Cork)
We discuss a novel approach for dealing with single-stage stochastic constraint satisfaction problems (SCSPs) that include random variables over a continuous or large discrete support. Our approach is based on two novel tools: sampled SCSPs and (α,ϑ)-solutions. Instead of explicitly enumerating a very large or infinite set of future scenarios, we employ statistical estimation to determine if a given assignment is consistent for a SCSP. As in statistical estimation, the quality of our estimate is determined via confidence interval analysis. In contrast to existing approaches based on sampling, we provide likelihood guarantees for the quality of the solutions found. Our approach can be used in concert with existing strategies for solving SCSPs.
Robust Online Optimization of Reward-Uncertain MDPs
Regan, Kevin (University of Toronto) | Boutilier, Craig (University of Toronto)
Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the number of nondominated policies is generally so large as to undermine this leverage. In this paper, we show how the quality of the approximation can be improved online by pruning/adding nondominated policies during reward elicitation, while maintaining computational tractability. Drawing insights from the POMDP literature, we also develop a new anytime algorithm for constructing the set of nondominated policies with provable (anytime) error bounds. These bounds can be exploited to great effect in our online approximation scheme.
Eliciting Additive Reward Functions for Markov Decision Processes
Regan, Kevin (University of Toronto) | Boutilier, Craig (University of Toronto)
Specifying the reward function of a Markov decision process (MDP) can be demanding, requiring human assessment of the precise quality of, and tradeoffs among, various states and actions. However, reward functions often possess considerable structure which can be leveraged to streamline their specification. We develop new, decision-theoretically sound heuristics for eliciting rewards for factored MDPs whose reward functions exhibit additive independence. Since we can often find good policies without complete reward specification, we also develop new (exact and approximate) algorithms for robust optimization ofimprecise-reward MDPs with such additive reward. Our methods are evaluated in two domains: autonomic computing and assistive technology.
A Trust Prediction Approach Capturing Agents' Dynamic Behavior
Liu, Xin (Nanyang Technological University) | Datta, Anwitaman (Nanyang Technological University)
Predicting trust among the agents is of great importance to various open distributed settings (e.g., e-market, peer-to-peer networks, etc.) in that dishonest agents can easily join the system and achieve their goals by circumventing agreed rules, or gaining unfair advantages, etc. Most existing trust mechanisms derive trust by statistically investigating the target agent's historical information. However, even if rich historical information is available, it is challenging to model an agent's behavior since an intelligent agent may strategically change its behavior to maximize its profits. We therefore propose a trust prediction approach to capture dynamic behavior of the target agent. Specifically, we first identify features which are capable of describing/representing context of a transaction. Then we use these features to measure similarity between context of the potential transaction and that of previous transactions to estimate trustworthiness of the potential transaction based on previous similar transactions' outcomes. Evaluation using real auction data and synthetic data demonstrates efficacy of our approach in comparison with an existing representative trust mechanism.
Scalable Multiagent Planning Using Probabilistic Inference
Kumar, Akshat (University of Massachusetts Amherst) | Zilberstein, Shlomo (University of Massachusetts Amherst) | Toussaint, Marc (FU Berlin)
Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models—NEXP-Complete even for two agents—has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability.
Randomized Sensing in Adversarial Environments
Krause, Andreas (Swiss Federal Institute of Technology, Zurich) | Roper, Alex (University of Michigan) | Golovin, Daniel (California Institute of Technology)
How should we manage a sensor network to optimally guard security-critical infrastructure? How should we coordinate search and rescue helicopters to best locate survivors after a major disaster? In both applications, we would like to control sensing resources in uncertain, adversarial environments. In this paper, we introduce RSense, an efficient algorithm which guarantees near-optimal randomized sensing strategies whenever the detection performance satisfies submodularity, a natural diminishing returns property, for any fixed adversarial scenario. Our approach combines techniques from game theory with submodular optimization. The RSense algorithm applies to settings where the goal is to manage a deployed sensor network or to coordinate mobile sensing resources (such as unmanned aerial vehicles). We evaluate our algorithms on two real-world sensing problems.