Goto

Collaborating Authors

 Computational Learning Theory


Symphony of high-dimensional brain

arXiv.org Artificial Intelligence

This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion varied from the necessity to take into account the difference between the theoretical random distributions and "extremely non-random" real distributions and revise the common machine learning theory, to different forms of the curse of dimensionality and high-dimensional pitfalls in neuroscience. V. K{\r{u}}rkov{\'a}, A. Tozzi and J.F. Peters, R. Quian Quiroga, P. Varona, R. Barrio, G. Kreiman, L. Fortuna, C. van Leeuwen, R. Quian Quiroga, and V. Kreinovich, A.N. Gorban, V.A. Makarov, and I.Y. Tyukin participated in the discussion. In this paper we analyse the symphony of opinions and the possible outcomes of the simplicity revolution for machine learning and neuroscience.


Distribution-Independent PAC Learning of Halfspaces with Massart Noise

arXiv.org Machine Learning

We study the problem of {\em distribution-independent} PAC learning of halfspaces in the presence of Massart noise. Specifically, we are given a set of labeled examples $(\mathbf{x}, y)$ drawn from a distribution $\mathcal{D}$ on $\mathbb{R}^{d+1}$ such that the marginal distribution on the unlabeled points $\mathbf{x}$ is arbitrary and the labels $y$ are generated by an unknown halfspace corrupted with Massart noise at noise rate $\eta<1/2$. The goal is to find a hypothesis $h$ that minimizes the misclassification error $\mathbf{Pr}_{(\mathbf{x}, y) \sim \mathcal{D}} \left[ h(\mathbf{x}) \neq y \right]$. We give a $\mathrm{poly}\left(d, 1/\epsilon \right)$ time algorithm for this problem with misclassification error $\eta+\epsilon$. We also provide evidence that improving on the error guarantee of our algorithm might be computationally hard. Prior to our work, no efficient weak (distribution-independent) learner was known in this model, even for the class of disjunctions. The existence of such an algorithm for halfspaces (or even disjunctions) has been posed as an open question in various works, starting with Sloan (1988), Cohen (1997), and was most recently highlighted in Avrim Blum's FOCS 2003 tutorial.


Inferring Latent dimension of Linear Dynamical System with Minimum Description Length

arXiv.org Machine Learning

Time-invariant linear dynamical system arises in many real-world applications,and its usefulness is widely acknowledged. A practical limitation with this model is that its latent dimension that has a large impact on the model capability needs to be manually specified. It can be demonstrated that a lower-order model class could be totally nested into a higher-order class, and the corresponding likelihood is nondecreasing. Hence, criterion built on the likelihood is not appropriate for model selection. This paper addresses the issue and proposes a criterion for linear dynamical system based on the principle of minimum description length. The latent structure, which is omitted in previous work, is explicitly considered in this newly proposed criterion. Our work extends the principle of minimum description length and demonstrates its effectiveness in the tasks of model training. The experiments on both univariate and multivariate sequences confirm the good performance of our newly proposed method.


Learning from weakly dependent data under Dobrushin's condition

arXiv.org Machine Learning

Statistical learning theory has largely focused on learning and generalization given independent and identically distributed (i.i.d.) samples. Motivated by applications involving time-series data, there has been a growing literature on learning and generalization in settings where data is sampled from an ergodic process. This work has also developed complexity measures, which appropriately extend the notion of Rademacher complexity to bound the generalization error and learning rates of hypothesis classes in this setting. Rather than time-series data, our work is motivated by settings where data is sampled on a network or a spatial domain, and thus do not fit well within the framework of prior work. We provide learning and generalization bounds for data that are complexly dependent, yet their distribution satisfies the standard Dobrushin's condition. Indeed, we show that the standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in our setting. Moreover, our generalization bounds only degrade by constant factors compared to their i.i.d. analogs, and our learnability bounds degrade by log factors in the size of the training set.


Dependency Learning for QBF

Journal of Artificial Intelligence Research

Quantified Boolean Formulas (QBFs) can be used to succinctly encode problems from domains such as formal verification, planning, and synthesis. One of the main approaches to QBF solving is Quantified Conflict Driven Clause Learning (QCDCL). By default, QCDCL assigns variables in the order of their appearance in the quantifier prefix so as to account for dependencies among variables. Dependency schemes can be used to relax this restriction and exploit independence among variables in certain cases, but only at the cost of nontrivial interferences with the proof system underlying QCDCL. We introduce dependency learning, a new technique for exploiting variable independence within QCDCL that allows solvers to learn variable dependencies on the fly. The resulting version of QCDCL enjoys improved propagation and increased flexibility in choosing variables for branching while retaining ordinary (long-distance) Q-resolution as its underlying proof system. We show that dependency learning can achieve exponential speedups over ordinary QCDCL. Experiments on standard benchmark sets demonstrate the effectiveness of this technique.


Subsumption-driven clause learning with DPLL+restarts

arXiv.org Artificial Intelligence

Complete SAT solvers make deductions until they find a model or produce the empty clause. In DPLL and CDCL solvers, these deductions are produced using assumptions generally called decisions. In DPLL solvers [DLL62], the knowledge accumulated since the beginning of the search is represented by the phases of decision literals. Each new conflict induced by decisions increases the amount of information being accumulated. This amount of information can be interpreted as a proportion of search space already explored that is known not to contain a model.


Online Matrix Completion with Side Information

arXiv.org Machine Learning

We give an online algorithm and prove novel mistake and regret bounds for online binary matrix completion with side information. The bounds we prove are of the form $\tilde{\mathcal{O}}({\mathcal{D}}/{\gamma^2})$. The term ${1}/{\gamma^2}$ is analogous to the usual margin term in SVM (perceptron) bounds. More specifically, if we assume that there is some factorization of the underlying $m\times n$ matrix into $\mathbf{P} \mathbf{Q}^{\top}$ where the rows of $\mathbf{P}$ are interpreted as ``classifiers'' in $\Re^d$ and the rows of $\mathbf{Q}$ as ``instances'' in $\Re^d$, then $\gamma$ is is the maximum (normalized) margin over all factorizations $\mathbf{P} \mathbf{Q}^{\top}$ consistent with the observed matrix. The quasi-dimension term $\mathcal{D}$ measures the quality of side information. In the presence of no side information, $\mathcal{D} = m+n$. However, if the side information is predictive of the underlying factorization of the matrix, then in the best case, $\mathcal{D} \in \mathcal{O}(k + \ell)$ where $k$ is the number of distinct row factors and $\ell$ is the number of distinct column factors. We additionally provide a generalization of our algorithm to the inductive setting. In this setting, the side information is not specified in advance. The results are similar to the transductive setting but in the best case, the quasi-dimension $\mathcal{D}$ is now bounded by $\mathcal{O}(k^2 + \ell^2)$.


Effective problem solving using SAT solvers

arXiv.org Artificial Intelligence

In this article we demonstrate how to solve a variety of problems and puzzles using the built-in SAT solver of the computer algebra system Maple. Once the problems have been encoded into Boolean logic, solutions can be found (or shown to not exist) automatically, without the need to implement any search algorithm. In particular, we describe how to solve the $n$-queens problem, how to generate and solve Sudoku puzzles, how to solve logic puzzles like the Einstein riddle, how to solve the 15-puzzle, how to solve the maximum clique problem, and finding Graeco-Latin squares.


Lower Bounds for Adversarially Robust PAC Learning

arXiv.org Machine Learning

In this work, we initiate a formal study of probably approximately correct (PAC) learning under evasion attacks, where the adversary's goal is to \emph{misclassify} the adversarially perturbed sample point $\widetilde{x}$, i.e., $h(\widetilde{x})\neq c(\widetilde{x})$, where $c$ is the ground truth concept and $h$ is the learned hypothesis. Previous work on PAC learning of adversarial examples have all modeled adversarial examples as corrupted inputs in which the goal of the adversary is to achieve $h(\widetilde{x}) \neq c(x)$, where $x$ is the original untampered instance. These two definitions of adversarial risk coincide for many natural distributions, such as images, but are incomparable in general. We first prove that for many theoretically natural input spaces of high dimension $n$ (e.g., isotropic Gaussian in dimension $n$ under $\ell_2$ perturbations), if the adversary is allowed to apply up to a sublinear $o(||x||)$ amount of perturbations on the test instances, PAC learning requires sample complexity that is exponential in $n$. This is in contrast with results proved using the corrupted-input framework, in which the sample complexity of robust learning is only polynomially more. We then formalize hybrid attacks in which the evasion attack is preceded by a poisoning attack. This is perhaps reminiscent of "trapdoor attacks" in which a poisoning phase is involved as well, but the evasion phase here uses the error-region definition of risk that aims at misclassifying the perturbed instances. In this case, we show PAC learning is sometimes impossible all together, even when it is possible without the attack (e.g., due to the bounded VC dimension).


A Novel Hyperparameter-free Approach to Decision Tree Construction that Avoids Overfitting by Design

arXiv.org Artificial Intelligence

Decision trees are an extremely popular machine learning technique. Unfortunately, overfitting in decision trees still remains an open issue that sometimes prevents achieving good performance. In this work, we present a novel approach for the construction of decision trees that avoids the overfitting by design, without losing accuracy. A distinctive feature of our algorithm is that it requires neither the optimization of any hyperparameters, nor the use of regularization techniques, thus significantly reducing the decision tree training time. Moreover, our algorithm produces much smaller and shallower trees than traditional algorithms, facilitating the interpretability of the resulting models.