Europe
Multiple Gaussian Process Models
Archambeau, Cedric, Bach, Francis
We consider a Gaussian process formulation of the multiple kernel learning problem. The goal is to select the convex combination of kernel matrices that best explains the data and by doing so improve the generalisation on unseen data. Sparsity in the kernel weights is obtained by adopting a hierarchical Bayesian approach: Gaussian process priors are imposed over the latent functions and generalised inverse Gaussians on their associated weights. This construction is equivalent to imposing a product of heavy-tailed process priors over function space. A variational inference algorithm is derived for regression and binary classification. 1 Introduction Kernel-based methods are well-established tools for supervised learning, allowing to perform various tasks, such as regression or binary classification, with linear and nonlinear predictors.
Multi-Instance Multi-Label Learning
Zhou, Zhi-Hua, Zhang, Min-Ling, Huang, Sheng-Jun, Li, Yu-Feng
Nanjing University, Nanjing 210046, China Abstract In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly. Email: zhouzh@lamda.nju.edu.cn 1 Introduction In traditional supervised learning, an object is represented by an instance, i.e., a feature vector, and associated with a class label. Formally, let X denote the instance space (or feature space) andY the set of class labels. In particular, each object in this framework belongs to only one concept and therefore the corresponding instance is associated with a single class label. However, many real-world objects are complicated, which may belong to multiple concepts simultaneously. For example, an image can belong to several classes simultaneously, e.g., grasslands, lions, Africa, etc.; a text document can be classified to several categories if it is viewed from different aspects, e.g., scientific novel, Jules Verne's writing or even books on traveling;aweb page can be recognized as news page, sports page, soccer page, etc. In a specific real task, maybe only one of the multiple concepts is the right semantic meaning. For example, in image retrieval when a user is interested in an image with lions, s/he may be only interested in the concept lions instead of the other concepts grasslands and Africa associated with that image. The difficulty here is caused by those objects that involve multiple concepts. To choose the right semantic meaning for such objects for a specific scenario is the fundamental difficulty of many tasks.
Convergence rates of efficient global optimization algorithms
Efficient global optimization is the problem of minimizing an unknown function f, using as few evaluations f(x) as possible. It can be considered as a continuum-armed bandit problem, with noiseless data and simple regret. Expected improvement is perhaps the most popular method for solving this problem; the algorithm performs well in experiments, but little is known about its theoretical properties. Implementing expected improvement requires a choice of Gaussian process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in the RKHS. We begin by providing convergence rates for this procedure. The rates are optimal for functions of low smoothness, and we modify the algorithm to attain optimal rates for smoother functions. For practitioners, however, these results are somewhat misleading. Priors are typically not held fixed, but depend on parameters estimated from the data. For standard estimators, we show this procedure may never discover the minimum of f. We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior.
Kernel Topic Models
Hennig, Philipp, Stern, David, Herbrich, Ralf, Graepel, Thore
We study a variation of this concept, in which the documents' mixture weight beliefs are replaced with squashed Gaussian distributions. This allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents. The main challenge is efficient approximate inference on the latent Gaussian. We present an approximate algorithm cast around a Laplace approximation in a transformed basis. The KTM can also be interpreted as a type of Gaussian process latent variable model, or as a topic model conditional on document features, uncovering links between earlier work in these areas.
Markov Equivalences for Subclasses of Loopless Mixed Graphs
In this paper we discuss four problems regarding Markov equivalences for subclasses of loopless mixed graphs. We classify these four problems as finding conditions for internal Markov equivalence, which is Markov equivalence within a subclass, for external Markov equivalence, which is Markov equivalence between subclasses, for representational Markov equivalence, which is the possibility of a graph from a subclass being Markov equivalent to a graph from another subclass, and finding algorithms to generate a graph from a certain subclass that is Markov equivalent to a given graph. We particularly focus on the class of maximal ancestral graphs and its subclasses, namely regression graphs, bidirected graphs, undirected graphs, and directed acyclic graphs, and present novel results for representational Markov equivalence and algorithms.
Integrating Testing and Interactive Theorem Proving
Chamarthi, Harsh Raju, Dillinger, Peter C., Kaufmann, Matt, Manolios, Panagiotis
Using an interactive theorem prover to reason about programs involves a sequence of interactions where the user challenges the theorem prover with conjectures. Invariably, many of the conjectures posed are in fact false, and users often spend considerable effort examining the theorem prover's output before realizing this. We present a synergistic integration of testing with theorem proving, implemented in the ACL2 Sedan (ACL2s), for automatically generating concrete counterexamples. Our method uses the full power of the theorem prover and associated libraries to simplify conjectures; this simplification can transform conjectures for which finding counterexamples is hard into conjectures where finding counterexamples is trivial. In fact, our approach even leads to better theorem proving, e.g. if testing shows that a generalization step leads to a false conjecture, we force the theorem prover to backtrack, allowing it to pursue more fruitful options that may yield a proof. The focus of the paper is on the engineering of a synergistic integration of testing with interactive theorem proving; this includes extending ACL2 with new functionality that we expect to be of general interest. We also discuss our experience in using ACL2s to teach freshman students how to reason about their programs.
Gaussian Process Regression Networks
Wilson, Andrew Gordon, Knowles, David A., Ghahramani, Zoubin
We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes. This model accommodates input dependent signal and noise correlations between multiple response variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distributions. We derive both efficient Markov chain Monte Carlo and variational Bayes inference procedures for this model. We apply GPRN as a multiple output regression and multivariate volatility model, demonstrating substantially improved performance over eight popular multiple output (multi-task) Gaussian process models and three multivariate volatility models on benchmark datasets, including a 1000 dimensional gene expression dataset.
First-Order Stable Model Semantics and First-Order Loop Formulas
Lin and Zhao's theorem on loop formulas states that in the propositional case the stable model semantics of a logic program can be completely characterized by propositional loop formulas, but this result does not fully carry over to the first-order case. We investigate the precise relationship between the first-order stable model semantics and first-order loop formulas, and study conditions under which the former can be represented by the latter. In order to facilitate the comparison, we extend the definition of a first-order loop formula which was limited to a nondisjunctive program, to a disjunctive program and to an arbitrary first-order theory. Based on the studied relationship we extend the syntax of a logic program with explicit quantifiers, which allows us to do reasoning involving non-Herbrand stable models using first-order reasoners. Such programs can be viewed as a special class of first-order theories under the stable model semantics, which yields more succinct loop formulas than the general language due to their restricted syntax.
Scheduling Bipartite Tournaments to Minimize Total Travel Distance
Hoshino, R., Kawarabayashi, K.
In many professional sports leagues, teams from opposing leagues/conferences compete against one another, playing inter-league games. This is an example of a bipartite tournament. In this paper, we consider the problem of reducing the total travel distance of bipartite tournaments, by analyzing inter-league scheduling from the perspective of discrete optimization. This research has natural applications to sports scheduling, especially for leagues such as the National Basketball Association (NBA) where teams must travel long distances across North America to play all their games, thus consuming much time, money, and greenhouse gas emissions. We introduce the Bipartite Traveling Tournament Problem (BTTP), the inter-league variant of the well-studied Traveling Tournament Problem. We prove that the 2n-team BTTP is NP-complete, but for small values of n, a distance-optimal inter-league schedule can be generated from an algorithm based on minimum-weight 4-cycle-covers. We apply our theoretical results to the 12-team Nippon Professional Baseball (NPB) league in Japan, producing a provably-optimal schedule requiring 42950 kilometres of total team travel, a 16% reduction compared to the actual distance traveled by these teams during the 2010 NPB season. We also develop a nearly-optimal inter-league tournament for the 30-team NBA league, just 3.8% higher than the trivial theoretical lower bound.
Projective Limit Random Probabilities on Polish Spaces
A pivotal problem in Bayesian nonparametrics is the construction of prior distributions on the space M(V) of probability measures on a given domain V. In principle, such distributions on the infinite-dimensional space M(V) can be constructed from their finite-dimensional marginals---the most prominent example being the construction of the Dirichlet process from finite-dimensional Dirichlet distributions. This approach is both intuitive and applicable to the construction of arbitrary distributions on M(V), but also hamstrung by a number of technical difficulties. We show how these difficulties can be resolved if the domain V is a Polish topological space, and give a representation theorem directly applicable to the construction of any probability distribution on M(V) whose first moment measure is well-defined. The proof draws on a projective limit theorem of Bochner, and on properties of set functions on Polish spaces to establish countable additivity of the resulting random probabilities.