Computational Learning Theory
Learning Theory and Algorithms for Forecasting Non-stationary Time Series
Kuznetsov, Vitaly, Mohri, Mehryar
We present data-dependent learning bounds for the general scenario of non-stationary non-mixing stochastic processes. Our learning guarantees are expressed in terms of a data-dependent measure of sequential complexity and a discrepancy measure that can be estimated from data under some mild assumptions. We use our learning bounds to devise new algorithms for non-stationary time series forecasting for which we report some preliminary experimental results.
Computational Intractability of Dictionary Learning for Sparse Representation
Razaviyayn, Meisam, Tseng, Hung-Wei, Luo, Zhi-Quan
In this paper we consider the dictionary learning problem for sparse representation. We first show that this problem is NP-hard by polynomial time reduction of the densest cut problem. Then, using successive convex approximation strategies, we propose efficient dictionary learning schemes to solve several practical formulations of this problem to stationary points. Unlike many existing algorithms in the literature, such as K-SVD, our proposed dictionary learning scheme is theoretically guaranteed to converge to the set of stationary points under certain mild assumptions. For the image denoising application, the performance and the efficiency of the proposed dictionary learning scheme are comparable to that of K-SVD algorithm in simulation.
Higher-order asymptotics for the parametric complexity
The parametric complexity is the key quantity in the minimum description length (MDL) approach to statistical model selection. Rissanen and others have shown that the parametric complexity of a statistical model approaches a simple function of the Fisher information volume of the model as the sample size $n$ goes to infinity. This paper derives higher-order asymptotic expansions for the parametric complexity, in the case of exponential families and independent and identically distributed data. These higher-order approximations are calculated for some examples and are shown to have better finite-sample behaviour than Rissanen's approximation. The higher-order terms are given as expressions involving cumulants (or, more naturally, the Amari-Chentsov tensors), and these terms are likely to be interesting in themselves since they arise naturally from the general information-theoretic principles underpinning MDL. The derivation given here specializes to an alternative and arguably simpler proof of Rissanen's result (for the case considered here), proving for the first time that his approximation is $O(n^{-1})$.
On the Robustness of Regularized Pairwise Learning Methods Based on Kernels
Christmann, Andreas, Zhou, Ding-Xuan
Regularized empirical risk minimization including support vector machines plays an important role in machine learning theory. In this paper regularized pairwise learning (RPL) methods based on kernels will be investigated. One example is regularized minimization of the error entropy loss which has recently attracted quite some interest from the viewpoint of consistency and learning rates. This paper shows that such RPL methods have additionally good statistical robustness properties, if the loss function and the kernel are chosen appropriately. We treat two cases of particular interest: (i) a bounded and non-convex loss function and (ii) an unbounded convex loss function satisfying a certain Lipschitz type condition.
Learning Regular Languages over Large Ordered Alphabets
Mens, Irini-Eleftheria, Maler, Oded
This work is concerned with regular languages defined over large alphabets, either infinite or just too large to be expressed enumeratively. We define a generic model where transitions are labeled by elements of a finite partition of the alphabet. We then extend Angluin's L* algorithm for learning regular languages from examples for such automata. We have implemented this algorithm and we demonstrate its behavior where the alphabet is a subset of the natural or real numbers. We sketch the extension of the algorithm to a class of languages over partially ordered alphabets.
SAT-based Analysis of Large Real-world Feature Models is Easy
Liang, Jia Hui, Ganesh, Vijay, Raman, Venkatesh, Czarnecki, Krzysztof
Modern conflict-driven clause-learning (CDCL) Boolean SAT solvers provide efficient automatic analysis of real-world feature models (FM) of systems ranging from cars to operating systems. It is well-known that solver-based analysis of real-world FMs scale very well even though SAT instances obtained from such FMs are large, and the corresponding analysis problems are known to be NP-complete. To better understand why SAT solvers are so effective, we systematically studied many syntactic and semantic characteristics of a representative set of large real-world FMs. We discovered that a key reason why large real-world FMs are easy-to-analyze is that the vast majority of the variables in these models are unrestricted, i.e., the models are satisfiable for both true and false assignments to such variables under the current partial assignment. Given this discovery and our understanding of CDCL SAT solvers, we show that solvers can easily find satisfying assignments for such models without too many backtracks relative to the model size, explaining why solvers scale so well. Further analysis showed that the presence of unrestricted variables in these real-world models can be attributed to their high-degree of variability. Additionally, we experimented with a series of well-known non-backtracking simplifications that are particularly effective in solving FMs. The remaining variables/clauses after simplifications, called the core, are so few that they are easily solved even with backtracking, further strengthening our conclusions.
A Modularity-Based Random SAT Instances Generator
Girรกldez-Cru, Jesรบs (IIIA-CSIC) | Levy, Jordi (IIIA-CSIC)
Nowadays, many industrial SAT instances can be solved efficiently by modern SAT solvers. However, the number of real-world instances is finite. Therefore, the process of development and test of SAT solving techniques can benefit of new models of random formulas that capture more realistically the features of real-world problems. In many works, the structure of industrial instances has been analyzed representing them as graphs and studying some of their properties, like modularity. In this paper, we use modularity, or community structure, to define a new model of pseudo-industrial random SAT instances, called Community Attachment. We prove that the phase transition point, if exists, is independent on the modularity. We evaluate the adequacy of this model to real industrial problems in terms of SAT solvers performance, and show that modern solvers do actually exploit this community structure.
On Conceptual Labeling of a Bag of Words
Sun, Xiangyan (Fudan University) | Xiao, Yanghua (Fudan University) | Wang, Haixun (Google Research) | Wang, Wei (Fudan University)
In natural language processing and information retrieval, the bag of words representation is used to implicitly represent the meaning of the text. Implicit semantics, however, are insufficient in supporting text or natural language based interfaces, which are adopted by an increasing number of applications. Indeed, in applications ranging from automatic ontology construction to question answering, explicit representation of semantics is starting to play a more prominent role. In this paper, we introduce the task of conceptual labeling (CL), which aims at generating a minimum set of conceptual labels that best summarize a bag of words. We draw the labels from a data driven semantic network that contains millions of highly connected concepts. The semantic network provides meaning to the concepts, and in turn, it provides meaning to the bag of words through the conceptual labels we generate. To achieve our goal, we use an information theoretic approach to trade-off the semantic coverage of a bag of words against the minimality of the output labels. Specifically, we use Minimum Description Length (MDL) as the criteria in selecting the best concepts. Our extensive experimental results demonstrate the effectiveness of our approach in representing the explicit semantics of a bag of words.
Efficient Algorithms with Performance Guarantees for the Stochastic Multiple-Choice Knapsack Problem
Tran-Thanh, Long (University of Southampton) | Xia, Yingce (University of Science and Technology of China) | Qin, Tao (Microsoft Research) | Jennings, Nicholas R (University of Southampton)
We study the stochastic multiple-choice knapsack problem, where a set of Kitems, whose value and weight are random variables, arrive to the system at each time step, and a decision maker has to choose at most one item to put into the knapsack without exceeding its capacity. The goal is the decision-maker is to maximise the total expected value of chosen items with respect to the knapsack capacity and a finite time horizon.We provide the first comprehensive theoretical analysis of the problem. In particular, we propose OPT-S-MCKP, the first algorithm that achieves optimality when the value-weight distributions are known. This algorithm also enjoys O(sqrt{T}) performance loss, where T is the finite time horizon, in the unknown value-weight distributions scenario.We also further develop two novel approximation methods, FR-S-MCKP and G-S-MCKP, and we prove that FR-S-MCKP achieves O(sqrt{T}) performance loss in both known and unknown value-weight distributions cases, while enjoying polynomial computational complexity per time step.On the other hand, G-S-MCKP does not have theoretical guarantees, but it still provides good performance in practice with linear running time.
A Neurodynamical System for finding a Minimal VC Dimension Classifier
Jayadeva, null, Soman, Sumit, Bhaya, Amit
The recently proposed Minimal Complexity Machine (MCM) finds a hyperplane classifier by minimizing an exact bound on the Vapnik-Chervonenkis (VC) dimension. The VC dimension measures the capacity of a learning machine, and a smaller VC dimension leads to improved generalization. On many benchmark datasets, the MCM generalizes better than SVMs and uses far fewer support vectors than the number used by SVMs. In this paper, we describe a neural network based on a linear dynamical system, that converges to the MCM solution. The proposed MCM dynamical system is conducive to an analogue circuit implementation on a chip or simulation using Ordinary Differential Equation (ODE) solvers. Numerical experiments on benchmark datasets from the UCI repository show that the proposed approach is scalable and accurate, as we obtain improved accuracies and fewer number of support vectors (upto 74.3% reduction) with the MCM dynamical system.