Computational Learning Theory
Minimum Description Length and Generalization Guarantees for Representation Learning
Sefidgaran, Milad, Zaidi, Abdellatif, Krasnowski, Piotr
A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While the study of representation learning has spurred much interest, most existing such approaches are heuristic; and very little is known about theoretical generalization guarantees. In this paper, we establish a compressibility framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm in terms of the "Minimum Description Length" (MDL) of the labels or the latent variables (representations). Rather than the mutual information between the encoder's input and the representation, which is often believed to reflect the algorithm's generalization capability in the related literature but in fact, falls short of doing so, our new bounds involve the "multi-letter" relative entropy between the distribution of the representations (or labels) of the training and test sets and a fixed prior. In particular, these new bounds reflect the structure of the encoder and are not vacuous for deterministic algorithms. Our compressibility approach, which is information-theoretic in nature, builds upon that of Blum-Langford for PAC-MDL bounds and introduces two essential ingredients: block-coding and lossy-compression. The latter allows our approach to subsume the so-called geometrical compressibility as a special case. To the best knowledge of the authors, the established generalization bounds are the first of their kind for Information Bottleneck (IB) type encoders and representation learning. Finally, we partly exploit the theoretical results by introducing a new data-dependent prior. Numerical simulations illustrate the advantages of well-chosen such priors over classical priors used in IB.
Universal Imitation Games
Alan Turing proposed in 1950 a framework called an imitation game to decide if a machine could think. Using mathematics developed largely after Turing -- category theory -- we analyze a broader class of universal imitation games (UIGs), which includes static, dynamic, and evolutionary games. In static games, the participants are in a steady state. In dynamic UIGs, "learner" participants are trying to imitate "teacher" participants over the long run. In evolutionary UIGs, the participants are competing against each other in an evolutionary game, and participants can go extinct and be replaced by others with higher fitness. We use the framework of category theory -- in particular, two influential results by Yoneda -- to characterize each type of imitation game. Universal properties in categories are defined by initial and final objects. We characterize dynamic UIGs where participants are learning by inductive inference as initial algebras over well-founded sets, and contrast them with participants learning by conductive inference over the final coalgebra of non-well-founded sets. We briefly discuss the extension of our categorical framework for UIGs to imitation games on quantum computers.
Credal Learning Theory
Caprio, Michele, Sultana, Maryam, Elia, Eleni, Cuzzolin, Fabio
Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learnt from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not) as well as infinite model spaces, which directly generalize classical results.
AlphaMapleSAT: An MCTS-based Cube-and-Conquer SAT Solver for Hard Combinatorial Problems
Jha, Piyush, Li, Zhengyu, Lu, Zhengyang, Bright, Curtis, Ganesh, Vijay
This paper introduces AlphaMapleSAT, a novel Monte Carlo Tree Search (MCTS) based Cube-and-Conquer (CnC) SAT solving method aimed at efficiently solving challenging combinatorial problems. Despite the tremendous success of CnC solvers in solving a variety of hard combinatorial problems, the lookahead cubing techniques at the heart of CnC have not evolved much for many years. Part of the reason is the sheer difficulty of coming up with new cubing techniques that are both low-cost and effective in partitioning input formulas into sub-formulas, such that the overall runtime is minimized. Lookahead cubing techniques used by current state-of-the-art CnC solvers, such as March, keep their cubing costs low by constraining the search for the optimal splitting variables. By contrast, our key innovation is a deductively-driven MCTS-based lookahead cubing technique, that performs a deeper heuristic search to find effective cubes, while keeping the cubing cost low. We perform an extensive comparison of AlphaMapleSAT against the March CnC solver on challenging combinatorial problems such as the minimum Kochen-Specker and Ramsey problems. We also perform ablation studies to verify the efficacy of the MCTS heuristic search for the cubing problem. Results show up to 2.3x speedup in parallel (and up to 27x in sequential) elapsed real time.
Knapsack: Connectedness, Path, and Shortest-Path
Dey, Palash, Kolay, Sudeshna, Singh, Sipra
We study the knapsack problem with graph theoretic constraints. That is, we assume that there exists a graph structure on the set of items of knapsack and the solution also needs to satisfy certain graph theoretic properties on top of knapsack constraints. In particular, we need to compute in the connected knapsack problem a connected subset of items which has maximum value subject to the size of knapsack constraint. We show that this problem is strongly NP-complete even for graphs of maximum degree four and NP-complete even for star graphs. On the other hand, we develop an algorithm running in time $O\left(2^{tw\log tw}\cdot\text{poly}(\min\{s^2,d^2\})\right)$ where $tw,s,d$ are respectively treewidth of the graph, size, and target value of the knapsack. We further exhibit a $(1-\epsilon)$ factor approximation algorithm running in time $O\left(2^{tw\log tw}\cdot\text{poly}(n,1/\epsilon)\right)$ for every $\epsilon>0$. We show similar results for several other graph theoretic properties, namely path and shortest-path under the problem names path-knapsack and shortestpath-knapsack. Our results seems to indicate that connected-knapsack is computationally hardest followed by path-knapsack and shortestpath-knapsack.
Learning shallow quantum circuits
Huang, Hsin-Yuan, Liu, Yunchao, Broughton, Michael, Kim, Isaac, Anshu, Anurag, Landau, Zeph, McClean, Jarrod R.
Despite fundamental interests in learning quantum circuits, the existence of a computationally efficient algorithm for learning shallow quantum circuits remains an open question. Because shallow quantum circuits can generate distributions that are classically hard to sample from, existing learning algorithms do not apply. In this work, we present a polynomial-time classical algorithm for learning the description of any unknown $n$-qubit shallow quantum circuit $U$ (with arbitrary unknown architecture) within a small diamond distance using single-qubit measurement data on the output states of $U$. We also provide a polynomial-time classical algorithm for learning the description of any unknown $n$-qubit state $\lvert \psi \rangle = U \lvert 0^n \rangle$ prepared by a shallow quantum circuit $U$ (on a 2D lattice) within a small trace distance using single-qubit measurements on copies of $\lvert \psi \rangle$. Our approach uses a quantum circuit representation based on local inversions and a technique to combine these inversions. This circuit representation yields an optimization landscape that can be efficiently navigated and enables efficient learning of quantum circuits that are classically hard to simulate.
A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs
Perez, Mateo, Somenzi, Fabio, Trivedi, Ashutosh
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.
Private Truly-Everlasting Robust-Prediction
Private Everlasting Prediction (PEP), recently introduced by Naor et al. [2023], is a model for differentially private learning in which the learner never publicly releases a hypothesis. Instead, it provides black-box access to a "prediction oracle" that can predict the labels of an endless stream of unlabeled examples drawn from the underlying distribution. Importantly, PEP provides privacy both for the initial training set and for the endless stream of classification queries. We present two conceptual modifications to the definition of PEP, as well as new constructions exhibiting significant improvements over prior work. Specifically, (1) Robustness: PEP only guarantees accuracy provided that all the classification queries are drawn from the correct underlying distribution. A few out-of-distribution queries might break the validity of the prediction oracle for future queries, even for future queries which are sampled from the correct distribution. We incorporate robustness against such poisoning attacks into the definition of PEP, and show how to obtain it. (2) Dependence of the privacy parameter $\delta$ in the time horizon: We present a relaxed privacy definition, suitable for PEP, that allows us to disconnect the privacy parameter $\delta$ from the number of total time steps $T$. This allows us to obtain algorithms for PEP whose sample complexity is independent from $T$, thereby making them "truly everlasting". This is in contrast to prior work where the sample complexity grows with $polylog(T)$. (3) New constructions: Prior constructions for PEP exhibit sample complexity that is quadratic in the VC dimension of the target class. We present new constructions of PEP for axis-aligned rectangles and for decision-stumps that exhibit sample complexity linear in the dimension (instead of quadratic). We show that our constructions satisfy very strong robustness properties.
Sharper Bounds for $\ell_p$ Sensitivity Sampling
Woodruff, David P., Yasuda, Taisuke
In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the VC dimension $d$ and the total sensitivity $\mathfrak S$ in remarkably general settings. However, guarantees going beyond this general bound of $\mathfrak S d$ are known in perhaps only one setting, for $\ell_2$ subspace embeddings, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p > 2$ that improve over the general $\mathfrak S d$ bound, achieving a bound of roughly $\mathfrak S^{2-2/p}$ for $2
On Learning for Ambiguous Chance Constrained Problems
Madhusudanarao, A Ch, Singh, Rahul
We study chance constrained optimization problems $\min_x f(x)$ s.t. $P(\left\{ \theta: g(x,\theta)\le 0 \right\})\ge 1-\epsilon$ where $\epsilon\in (0,1)$ is the violation probability, when the distribution $P$ is not known to the decision maker (DM). When the DM has access to a set of distributions $\mathcal{U}$ such that $P$ is contained in $\mathcal{U}$, then the problem is known as the ambiguous chance-constrained problem \cite{erdougan2006ambiguous}. We study ambiguous chance-constrained problem for the case when $\mathcal{U}$ is of the form $\left\{\mu:\frac{\mu (y)}{\nu(y)}\leq C, \forall y\in\Theta, \mu(y)\ge 0\right\}$, where $\nu$ is a ``reference distribution.'' We show that in this case the original problem can be ``well-approximated'' by a sampled problem in which $N$ i.i.d. samples of $\theta$ are drawn from $\nu$, and the original constraint is replaced with $g(x,\theta_i)\le 0,~i=1,2,\ldots,N$. We also derive the sample complexity associated with this approximation, i.e., for $\epsilon,\delta>0$ the number of samples which must be drawn from $\nu$ so that with a probability greater than $1-\delta$ (over the randomness of $\nu$), the solution obtained by solving the sampled program yields an $\epsilon$-feasible solution for the original chance constrained problem.