Technology
On the Cover-Hart Inequality: What's a Sample of Size One Worth?
Bob predicts a future observation based on a sample of size one. Alice can draw a sample of any size before issuing her prediction. How much better can she do than Bob? Perhaps surprisingly, under a large class of loss functions, which we refer to as the Cover-Hart family, the best Alice can do is to halve Bob's risk. In this sense, half the information in an infinite sample is contained in a sample of size one. The Cover-Hart family is a convex cone that includes metrics and negative definite functions, subject to slight regularity conditions. These results may help explain the small relative differences in empirical performance measures in applied classification and forecasting problems, as well as the success of reasoning and learning by analogy in general, and nearest neighbor techniques in particular.
The third open Answer Set Programming competition
Calimeri, Francesco, Ianni, Giovambattista, Ricca, Francesco
Answer Set Programming (ASP) is a well-established paradigm of declarative programming in close relationship with other declarative formalisms such as SAT Modulo Theories, Constraint Handling Rules, FO(.), PDDL and many others. Since its first informal editions, ASP systems have been compared in the now well-established ASP Competition. The Third (Open) ASP Competition, as the sequel to the ASP Competitions Series held at the University of Potsdam in Germany (2006-2007) and at the University of Leuven in Belgium in 2009, took place at the University of Calabria (Italy) in the first half of 2011. Participants competed on a pre-selected collection of benchmark problems, taken from a variety of domains as well as real world applications. The Competition ran on two tracks: the Model and Solve (M&S) Track, based on an open problem encoding, and open language, and open to any kind of system based on a declarative specification paradigm; and the System Track, run on the basis of fixed, public problem encodings, written in a standard ASP language. This paper discusses the format of the Competition and the rationale behind it, then reports the results for both tracks. Comparison with the second ASP competition and state-of-the-art solutions for some of the benchmark domains is eventually discussed. To appear in Theory and Practice of Logic Programming (TPLP).
- Europe > Italy > Calabria (0.24)
- Europe > Germany > Brandenburg > Potsdam (0.24)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.24)
- (3 more...)
Statistical Consistency of Finite-dimensional Unregularized Linear Classification
Binary linear classification operates as follows: obtain a new instance, determine a set of real-valued features, form their weighted combination, and output a label which is positive iff this combination is nonnegative. The interpretability, empirical performance, and theoretical depth of this scheme have all contributed to its continued popularity (Freund and Schapire, 1997, Friedman et al., 2000, Caruana and Niculescu-Mizil, 2006). In order to obtain the coefficients in the above weighting, convex optimization is typically employed. Specifically, rather than just trying to pick the weighting which makes the fewest mistakes over a finite sample -- which is computationally intractable -- consider instead paying attention to the amount by which these combinations clear the zero threshold, a quantity called the margin. Applying a convex penalty to these margins yields a convex optimization procedure, specifically one which can be specialized into both logistic regression and AdaBoost.
- Asia > Middle East > Lebanon (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
Multiple Operator-valued Kernel Learning
Kadri, Hachem, Rakotomamonjy, Alain, Bach, Francis, Preux, Philippe
Positive definite operator-valued kernels generalize the well-known notion of reproducing kernels, and are naturally adapted to multi-output learning situations. This paper addresses the problem of learning a finite linear combination of infinite-dimensional operator-valued kernels which are suitable for extending functional data analysis methods to nonlinear contexts. We study this problem in the case of kernel ridge regression for functional responses with an lr-norm constraint on the combination coefficients. The resulting optimization problem is more involved than those of multiple scalar-valued kernel learning since operator-valued kernels pose more technical and theoretical issues. We propose a multiple operator-valued kernel learning algorithm based on solving a system of linear operator equations by using a block coordinatedescent procedure. We experimentally validate our approach on a functional regression task in the context of finger movement prediction in brain-computer interfaces.
- Europe > France > Normandy > Seine-Maritime > Rouen (0.04)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
Identifiability and Unmixing of Latent Parse Trees
Hsu, Daniel, Kakade, Sham M., Liang, Percy
This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models.
Learning Inclusion-Optimal Chordal Graphs
Auvray, Vincent, Wehenkel, Louis
Chordal graphs can be used to encode dependency models that are representable by both directed acyclic and undirected graphs. This paper discusses a very simple and efficient algorithm to learn the chordal structure of a probabilistic model from data. The algorithm is a greedy hill-climbing search algorithm that uses the inclusion boundary neighborhood over chordal graphs. In the limit of a large sample size and under appropriate hypotheses on the scoring criterion, we prove that the algorithm will find a structure that is inclusion-optimal when the dependency model of the data-generating distribution can be represented exactly by an undirected graph. The algorithm is evaluated on simulated datasets.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Belgium (0.04)
Estimation and Clustering with Infinite Rankings
This paper presents a natural extension of stagewise ranking to the the case of infinitely many items. We introduce the infinite generalized Mallows model (IGM), describe its properties and give procedures to estimate it from data. For estimation of multimodal distributions we introduce the Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of the new model and demonstrate that infinite models can be simple, elegant and practical.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- (4 more...)
AND/OR Importance Sampling
The paper introduces AND/OR importance sampling for probabilistic graphical models. In contrast to importance sampling, AND/OR importance sampling caches samples in the AND/OR space and then extracts a new sample mean from the stored samples. We prove that AND/OR importance sampling may have lower variance than importance sampling; thereby providing a theoretical justification for preferring it over importance sampling. Our empirical evaluation demonstrates that AND/OR importance sampling is far more accurate than importance sampling in many cases.
- North America > United States > California > Orange County > Irvine (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions
Isaza, Alejandro, Szepesvari, Csaba, Bulitko, Vadim, Greiner, Russell
In this paper, we consider planning in stochastic shortest path (SSP) problems, a subclass of Markov Decision Problems (MDP). We focus on medium-size problems whose state space can be fully enumerated. This problem has numerous important applications, such as navigation and planning under uncertainty. We propose a new approach for constructing a multi-level hierarchy of progressively simpler abstractions of the original problem. Once computed, the hierarchy can be used to speed up planning by first finding a policy for the most abstract level and then recursively refining it into a solution to the original problem. This approach is fully automated and delivers a speed-up of two orders of magnitude over a state-of-the-art MDP solver on sample problems while returning near-optimal solutions. We also prove theoretical bounds on the loss of solution optimality resulting from the use of abstractions.
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Identifying Optimal Sequential Decisions
Dawid, Philip, Didelez, Vanessa
We consider conditions that allow us to find an optimal strategy for sequential decisions from a given data situation. For the case where all interventions are unconditional (atomic), identifiability has been discussed by Pearl & Robins (1995). We argue here that an optimal strategy must be conditional, i.e. take the information available at each decision point into account. We show that the identification of an optimal sequential decision strategy is more restrictive, in the sense that conditional interventions might not always be identified when atomic interventions are. We further demonstrate that a simple graphical criterion for the identifiability of an optimal strategy can be given.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Bristol (0.04)