Uncertainty
Multi-Relational Learning with Gaussian Processes
Xu, Zhao (Fraunhofer IAIS) | Kersting, Kristian (Fraunhofer IAIS) | Tresp, Volker (Siemens Corporate Technology)
Due to their flexible nonparametric nature, Gaussian process models are very effective at solving hard machine learning problems. While existing Gaussian process models focus on modeling one single relation, we present a generalized GP model, named multi-relational Gaussian process model, that is able to deal with an arbitrary number of relations in a domain of interest. The proposed model is analyzed in the context of bipartite, directed, and undirected univariate relations. Experimental results on real-world datasets show that exploiting the correlations among different entity types and relations can indeed improve prediction performance.
Preference Learning with Extreme Examples
Wang, Fei (Florida International University) | Zhang, Bin (IBM CRL) | Li, Ta-Hsin (IBM T. J. Watson) | Yin, Wenjun (IBM CRL) | Dong, Jin (IBM CRL) | Li, Tao (Florida International University)
In this paper, we consider a general problem of semi-supervised preference learning, in which we assume that we have the information of the extreme cases and some ordered constraints, our goal is to learn the unknown preferences of the other places. Taking the potential housing place selection problem as an example, we have many candidate places together with their associated information (e.g., position, environment), and we know some extreme examples (i.e., several places are perfect for building a house, and several places are the worst that cannot build a house there), and we know some partially ordered constraints (i.e., for two places, which place is better), then how can we judge the preference of one potential place whose preference is unknown beforehand? We propose a Bayesian framework based on Gaussian process to tackle this problem, from which we not only solve for the unknown preferences, but also the hyperparameters contained in our model.
Semi-Supervised Classification using Sparse Gaussian Process Regression
Patel, Amrish (Indian Institute of Science) | Sundararajan, S. (Yahoo! Labs) | Shevade, Shirish (Indian Institute of Science)
Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.
Semi-Supervised Learning of Visual Classifiers from Web Images and Text
Morsillo, Nicholas (University of Rochester) | Pal, Christopher (รcole Polytechnique de Montrรฉal) | Nelson, Randal (University of Rochester)
The web holds tremendous potential as a source of training data for visual classification. However, web images must be correctly indexed and labeled before this potential can be realized. Accordingly, there has been considerable recent interest in collecting imagery from the web using image search engines to build databases for object and scene recognition research. While search engines can provide rough sets of image data, results are noisy and this leads to problems when training classifiers. In this paper we propose a semi-supervised model for automatically collecting clean example imagery from the web. Our approach includes both visual and textual web data in a unified framework. Minimal supervision is enabled by the selective use of generative and discriminative elements in a probabilistic model and a novel learning algorithm. We show through experiments that our model discovers good training images from the web with minimal manual work. Classifiers trained using our method significantly outperform analogous baseline approaches on the Caltech-256 dataset.
Exponential Family Sparse Coding with Applications to Self-taught Learning
Lee, Honglak (Stanford University) | Raina, Rajat (Stanford University) | Teichman, Alex (Stanford University) | Ng, Andrew (Stanford University)
Sparse coding is an unsupervised learning algorithm for finding concise, slightly higher-level representations of inputs, and has been successfully applied to self-taught learning, where the goal is to use unlabeled data to help on a supervised learning task, even if the unlabeled data cannot be associated with the labels of the supervised task (Raina et al., 2007). However, sparse coding uses a Gaussian noise model and a quadratic loss function, and thus performs poorly if applied to binary valued, integer valued, or other non-Gaussian data, such as text. Drawing on ideas from generalized linear models (GLMs), we present a generalization of sparse coding to learning with data drawn from any exponential family distribution (such as Bernoulli, Poisson, etc). This gives a method that we argue is much better suited to model other data types than Gaussian. We present an algorithm for solving the L1-regularized optimization problem defined by this model, and show that it is especially efficient when the optimal solution is sparse. We also show that the new model results in significantly improved self-taught learning performance when applied to text classification and to a robotic perception task.
Bayesian Extreme Components Analysis
Chen, Yutian (University of California, Irvine) | Welling, Max (University of California, Irvine)
Extreme Components Analysis (XCA) is a statistical method based on a single eigenvalue decomposition to recover the optimal combination of principal and minor components in the data. Unfortunately, minor components are notoriously sensitive to overfitting when the number of data items is small relative to the number of attributes. We present a Bayesian extension of XCA by introducing a conjugate prior for the parameters of the XCA model. This Bayesian-XCA is shown to outperform plain vanilla XCA as well as Bayesian-PCA and XCA based on a frequentist correction to the sample spectrum. Moreover, we show that minor components are only picked when they represent genuine constraints in the data, even for very small sample sizes. An extension to mixtures of Bayesian XCA models is also explored.
Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
Akiyama, Takayuki (Tokyo Institute of Technology) | Hachiya, Hirotaka (Tokyo Institute of Technology) | Sugiyama, Masashi (Tokyo Institute of Technology)
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. We demonstrate the usefulness of the proposed method, named active policy iteration (API), through simulations with a batting robot.
A New Bayesian Approach to Multiple Intermittent Fault Diagnosis
Abreu, Rui (Delft University of Technology) | Zoeteweij, Peter (Delft University of Technology) | Gemund, Arjan J.C. van (Delft University of Technology)
Logic reasoning approaches to fault diagnosis account for the fact that a component c j may fail intermittently by introducing a parameter g j that expresses the probability the component exhibits correct behavior. This component parameter g j , in conjunction with a priori fault probability, is usedin a Bayesian framework to compute the posterior fault candidate probabilities. Usually, information on g j is not known a priori. While proper estimation of g j can have a great impact on the diagnostic accuracy, at present, only approximations have been proposed. We present a novel framework, BARINEL, that computes exact estimations of g j as integral part of the posterior candidate probability computation. BARINELโs diagnostic performance is evaluated for both synthetic and real software systems. Our results show that our approach is superior to approaches based on classical persistent fault models as well as previously proposed intermittent fault models.
Preference Functions That Score Rankings and Maximum Likelihood Estimation
Conitzer, Vincent (Duke University) | Rognlie, Matthew (Duke University) | Xia, Lirong (Duke University)
In social choice, a preference function (PF) takes a set of votes (linear orders over a set of alternatives) as input, and produces one or more rankings (also linear orders over the alternatives) as output. Such functions have many applications, for example, aggregating the preferences of multiple agents, or merging rankings (of, say, webpages) into a single ranking. The key issue is choosing a PF to use. One natural and previously studied approach is to assume that there is an unobserved "correct" ranking, and the votes are noisy estimates of this. Then, we can use the PF that always chooses the maximum likelihood estimate (MLE) of the correct ranking. In this paper, we define simple ranking scoring functions (SRSFs) and show that the class of neutral SRSFs is exactly the class of neutral PFs that are MLEs for some noise model. We also define composite ranking scoring functions (CRSFs) and show a condition under which these coincide with SRSFs. We study key properties such as consistency and continuity, and consider some example PFs. In particular, we study Single Transferable Vote (STV), a commonly used PF, showing that it is a CRSF but not an SRSF, thereby clarifying the extent to which it is an MLE function. This also gives a new perspective on how ties should be broken under STV. We leave some open questions.
Activity Recognition: Linking Low-Level Sensors to High-Level Intelligence
Yang, Qiang (Hong Kong Hong Kong University of Science and Technology)
Sensors provide computer systems with a window to the outside world. Activity recognition "sees" what is in the window to predict the locations, trajectories, actions, goals and plans of humans and objects. Building an activity recognition system requires a full range of interaction from statistical inference on lower level sensor data to symbolic AI at higher levels, where prediction results and acquired knowledge are passed up each level to form a knowledge food chain. In this article, I will give an overview of some of the current activity recognition research works and explore a life-cycle of learning and inference that allows the lowest-level radio-frequency signals to be transformed into symbolic logical representations for AI planning, which in turn controls the robots or guides human users through a sensor network, thus completing a full life-cycle of knowledge.