Computational Learning Theory
Higher-order asymptotics for the parametric complexity
The minimum description length (MDL) principle provides a general information-theoretic approach to model selection and other forms of statistical inference [5, 17]. The MDL criterion for model selection is consistent, meaning that it will select the data-generating model from a countable set of competing parametric models with probability approaching 1 as the sample size n goes to infinity [4]. For example, if each of the parametric models is a logistic regression model with predictor variables taken from a fixed set of potential predictors, then the MDL model-selection criterion will choose the correct combination of predictors with probability approaching 1 as n . The MDL model-selection criterion also has a number of strong optimality properties, which greatly extend Shannon's noiseless coding theorem [5, ยงIII.E]. In its simplest form, the MDL principle advocates choosing the model for which the observed data has the shortest message length under a particular prefix code defined by a minimax condition [11, ยง2.4.3]. Shtarkov [19] showed that this is equivalent to choosing the model with the largest normalized maximum likelihood (NML) for the observed data.
On the Robustness of Regularized Pairwise Learning Methods Based on Kernels
Christmann, Andreas, Zhou, Ding-Xuan
Regularized empirical risk minimization including support vector machines plays an important role in machine learning theory. In this paper regularized pairwise learning (RPL) methods based on kernels will be investigated. One example is regularized minimization of the error entropy loss which has recently attracted quite some interest from the viewpoint of consistency and learning rates. This paper shows that such RPL methods have additionally good statistical robustness properties, if the loss function and the kernel are chosen appropriately. We treat two cases of particular interest: (i) a bounded and non-convex loss function and (ii) an unbounded convex loss function satisfying a certain Lipschitz type condition.
Learning Regular Languages over Large Ordered Alphabets
Mens, Irini-Eleftheria, Maler, Oded
This work is concerned with regular languages defined over large alphabets, either infinite or just too large to be expressed enumeratively. We define a generic model where transitions are labeled by elements of a finite partition of the alphabet. We then extend Angluin's L* algorithm for learning regular languages from examples for such automata. We have implemented this algorithm and we demonstrate its behavior where the alphabet is a subset of the natural or real numbers. We sketch the extension of the algorithm to a class of languages over partially ordered alphabets.
SAT-based Analysis of Large Real-world Feature Models is Easy
Liang, Jia Hui, Ganesh, Vijay, Raman, Venkatesh, Czarnecki, Krzysztof
Modern conflict-driven clause-learning (CDCL) Boolean SAT solvers provide efficient automatic analysis of real-world feature models (FM) of systems ranging from cars to operating systems. It is well-known that solver-based analysis of real-world FMs scale very well even though SAT instances obtained from such FMs are large, and the corresponding analysis problems are known to be NP-complete. To better understand why SAT solvers are so effective, we systematically studied many syntactic and semantic characteristics of a representative set of large real-world FMs. We discovered that a key reason why large real-world FMs are easy-to-analyze is that the vast majority of the variables in these models are unrestricted, i.e., the models are satisfiable for both true and false assignments to such variables under the current partial assignment. Given this discovery and our understanding of CDCL SAT solvers, we show that solvers can easily find satisfying assignments for such models without too many backtracks relative to the model size, explaining why solvers scale so well. Further analysis showed that the presence of unrestricted variables in these real-world models can be attributed to their high-degree of variability. Additionally, we experimented with a series of well-known non-backtracking simplifications that are particularly effective in solving FMs. The remaining variables/clauses after simplifications, called the core, are so few that they are easily solved even with backtracking, further strengthening our conclusions.
A Modularity-Based Random SAT Instances Generator
Girรกldez-Cru, Jesรบs (IIIA-CSIC) | Levy, Jordi (IIIA-CSIC)
Nowadays, many industrial SAT instances can be solved efficiently by modern SAT solvers. However, the number of real-world instances is finite. Therefore, the process of development and test of SAT solving techniques can benefit of new models of random formulas that capture more realistically the features of real-world problems. In many works, the structure of industrial instances has been analyzed representing them as graphs and studying some of their properties, like modularity. In this paper, we use modularity, or community structure, to define a new model of pseudo-industrial random SAT instances, called Community Attachment. We prove that the phase transition point, if exists, is independent on the modularity. We evaluate the adequacy of this model to real industrial problems in terms of SAT solvers performance, and show that modern solvers do actually exploit this community structure.
On Conceptual Labeling of a Bag of Words
Sun, Xiangyan (Fudan University) | Xiao, Yanghua (Fudan University) | Wang, Haixun (Google Research) | Wang, Wei (Fudan University)
In natural language processing and information retrieval, the bag of words representation is used to implicitly represent the meaning of the text. Implicit semantics, however, are insufficient in supporting text or natural language based interfaces, which are adopted by an increasing number of applications. Indeed, in applications ranging from automatic ontology construction to question answering, explicit representation of semantics is starting to play a more prominent role. In this paper, we introduce the task of conceptual labeling (CL), which aims at generating a minimum set of conceptual labels that best summarize a bag of words. We draw the labels from a data driven semantic network that contains millions of highly connected concepts. The semantic network provides meaning to the concepts, and in turn, it provides meaning to the bag of words through the conceptual labels we generate. To achieve our goal, we use an information theoretic approach to trade-off the semantic coverage of a bag of words against the minimality of the output labels. Specifically, we use Minimum Description Length (MDL) as the criteria in selecting the best concepts. Our extensive experimental results demonstrate the effectiveness of our approach in representing the explicit semantics of a bag of words.
Learning Cooperative Games
Balcan, Maria Florina (Carnegie-Mellon University) | Procaccia, Ariel D. (Carnegie-Mellon University) | Zick, Yair (Carnegie-Mellon University)
This paper explores a PAC (probably approximately correct) learning model in cooperative games. Specifically, we are given m random samples of coalitions and their values, taken from some unknown cooperative game; can we predict the values of unseen coalitions? We study the PAC learnability of several well-known classes of cooperative games, such as network flow games, threshold task games, and induced subgraph games. We also establish a novel connection between PAC learnability and core stability: for games that are efficiently learnable, it is possible to find payoff divisions that are likely to be stable using a polynomial number of samples.
Efficient Algorithms with Performance Guarantees for the Stochastic Multiple-Choice Knapsack Problem
Tran-Thanh, Long (University of Southampton) | Xia, Yingce (University of Science and Technology of China) | Qin, Tao (Microsoft Research) | Jennings, Nicholas R (University of Southampton)
We study the stochastic multiple-choice knapsack problem, where a set of Kitems, whose value and weight are random variables, arrive to the system at each time step, and a decision maker has to choose at most one item to put into the knapsack without exceeding its capacity. The goal is the decision-maker is to maximise the total expected value of chosen items with respect to the knapsack capacity and a finite time horizon.We provide the first comprehensive theoretical analysis of the problem. In particular, we propose OPT-S-MCKP, the first algorithm that achieves optimality when the value-weight distributions are known. This algorithm also enjoys O(sqrt{T}) performance loss, where T is the finite time horizon, in the unknown value-weight distributions scenario.We also further develop two novel approximation methods, FR-S-MCKP and G-S-MCKP, and we prove that FR-S-MCKP achieves O(sqrt{T}) performance loss in both known and unknown value-weight distributions cases, while enjoying polynomial computational complexity per time step.On the other hand, G-S-MCKP does not have theoretical guarantees, but it still provides good performance in practice with linear running time.
Learning with Square Loss: Localization through Offset Rademacher Complexity
Liang, Tengyuan, Rakhlin, Alexander, Sridharan, Karthik
Determining the finite-sample behavior of risk in the problem of regression is arguably one of the most basic problems of Learning Theory and Statistics. This behavior can be studied in substantial generality with the tools of empirical process theory. When functions in a given convex class are uniformly bounded, one may verify the socalled "Bernstein condition." The condition--which relates the variance of the increments of the empirical process to their expectation--implies a certain localization phenomenon around the optimum and forms the basis of the analysis via local Rademacher complexities. The technique has been developed in [9, 8, 5, 2, 4], among others, based on Talagrand's celebrated concentration inequality for the supremum of an empirical process. In a recent pathbreaking paper, [14] showed that a large part of this heavy machinery is not necessary for obtaining tight upper bounds on excess loss, even--and especially--if functions are unbounded. Mendelson observed that only one-sided control of the tail is required in the deviation inequality, and, thankfully, it is the tail that can be controlled under very mild assumptions. In a parallel line of work, the search within the online learning setting for an analogue of "localization" has led to a notion of an "offset" Rademacher process [17], yielding--in a rather clean manner--optimal rates for minimax regret in online supervised learning. It was also shown that the supremum of the offset process is a lower bound on the minimax value, thus establishing its intrinsic nature.
A Neurodynamical System for finding a Minimal VC Dimension Classifier
Jayadeva, null, Soman, Sumit, Bhaya, Amit
The recently proposed Minimal Complexity Machine (MCM) finds a hyperplane classifier by minimizing an exact bound on the Vapnik-Chervonenkis (VC) dimension. The VC dimension measures the capacity of a learning machine, and a smaller VC dimension leads to improved generalization. On many benchmark datasets, the MCM generalizes better than SVMs and uses far fewer support vectors than the number used by SVMs. In this paper, we describe a neural network based on a linear dynamical system, that converges to the MCM solution. The proposed MCM dynamical system is conducive to an analogue circuit implementation on a chip or simulation using Ordinary Differential Equation (ODE) solvers. Numerical experiments on benchmark datasets from the UCI repository show that the proposed approach is scalable and accurate, as we obtain improved accuracies and fewer number of support vectors (upto 74.3% reduction) with the MCM dynamical system.Keywords.