Computational Learning Theory
AdaBoost is not an Optimal Weak to Strong Learner
Høgsgaard, Mikael Møller, Larsen, Kasper Green, Ritzert, Martin
AdaBoost is a classic boosting algorithm for combining multiple inaccurate classifiers produced by a weak learner, to produce a strong learner with arbitrarily high accuracy when given enough training data. Determining the optimal number of samples necessary to obtain a given accuracy of the strong learner, is a basic learning theoretic question. Larsen and Ritzert (NeurIPS'22) recently presented the first provably optimal weak-to-strong learner. However, their algorithm is somewhat complicated and it remains an intriguing question whether the prototypical boosting algorithm AdaBoost also makes optimal use of training samples. In this work, we answer this question in the negative. Concretely, we show that the sample complexity of AdaBoost, and other classic variations thereof, are sub-optimal by at least one logarithmic factor in the desired accuracy of the strong learner.
New Methods to solve NP-Hard problems part1(Computational Complexity)
Abstract:: NP-hard problems are not believed to be exactly solvable through general polynomial time algorithms. Hybrid quantum-classical algorithms to address such combinatorial problems have been of great interest in the past few years. Such algorithms are heuristic in nature and aim to obtain an approximate solution. Significant improvements in computational time and/or the ability to treat large problems are some of the principal promises of quantum computing in this regard. The hardware, however, is still in its infancy and the current Noisy Intermediate Scale Quantum (NISQ) computers are not able to optimize industrially relevant problems.
A learning theory for quantum photonic processors and beyond
The last years have seen an incredible advancement in hardware solutions for quantum technologies. In particular, the recent demonstration of a quantum computational advantage via photonic circuits [1, 2] finally paves the way for the realization of full-fledged quantum information processing with light, a solution that bears intrinsic advantages with respect to other platforms, in terms of scalability, robustness and deployability [3, 4, 5]. At the same time, the increased control of infinite-dimensional quantum states in several other platforms, such as cavity [6, 7] or mechanical resonators [8], is pushing the boundaries of continuous-variable (CV) quantum information processing beyond photonics. Finally, the increased interplay between qubit and CV platforms [9, 10] spurs the interest into the development of quantum error correction codes [11, 12, 13] and provides an alternative to more standard approaches for quantum technologies. A combination of the aforementioned events thus marks a renewed surge of interest into CV information processing. From a theoretical perspective, the characterization of the information-processing capabilities of quantum devices has been recently subject to a paradigm shift, thanks to the introduction of statistical learning techniques [14, 15, 16, 17, 18], which underly the success of classical machine learning [19, 20, 21]. In this approach, one recognizes that a successful use of quantum devices often requires two ingredients: (i) the estimation of quantities of interest about the quantum states or processes running in the device; (ii) the optimization of the device's parameter setup based on the estimated data, in order to maximize the device's performance in a specific task.
Making Decisions under Outcome Performativity
Kim, Michael P., Perdomo, Juan C.
Decision-makers often act in response to data-driven predictions, with the goal of achieving favorable outcomes. In such settings, predictions don't passively forecast the future; instead, predictions actively shape the distribution of outcomes they are meant to predict. This performative prediction setting raises new challenges for learning "optimal" decision rules. In particular, existing solution concepts do not address the apparent tension between the goals of forecasting outcomes accurately and steering individuals to achieve desirable outcomes. To contend with this concern, we introduce a new optimality concept -- performative omniprediction -- adapted from the supervised (non-performative) learning setting. A performative omnipredictor is a single predictor that simultaneously encodes the optimal decision rule with respect to many possibly-competing objectives. Our main result demonstrates that efficient performative omnipredictors exist, under a natural restriction of performative prediction, which we call outcome performativity. On a technical level, our results follow by carefully generalizing the notion of outcome indistinguishability to the outcome performative setting. From an appropriate notion of Performative OI, we recover many consequences known to hold in the supervised setting, such as omniprediction and universal adaptability.
A Labelled Sample Compression Scheme of Size at Most Quadratic in the VC Dimension
Mansouri, Farnam, Zilles, Sandra
This paper presents a construction of a proper and stable labelled sample compression scheme of size $O(\VCD^2)$ for any finite concept class, where $\VCD$ denotes the Vapnik-Chervonenkis Dimension. The construction is based on a well-known model of machine teaching, referred to as recursive teaching dimension. This substantially improves on the currently best known bound on the size of sample compression schemes (due to Moran and Yehudayoff), which is exponential in $\VCD$. The long-standing open question whether the smallest size of a sample compression scheme is in $O(\VCD)$ remains unresolved, but our results show that research on machine teaching is a promising avenue for the study of this open problem. As further evidence of the strong connections between machine teaching and sample compression, we prove that the model of no-clash teaching, introduced by Kirkpatrick et al., can be used to define a non-trivial lower bound on the size of stable sample compression schemes.
Fast and fully-automated histograms for large-scale data sets
Mendizábal, Valentina Zelaya, Boullé, Marc, Rossi, Fabrice
G-Enum histograms are a new fast and fully automated method for irregular histogram construction. By framing histogram construction as a density estimation problem and its automation as a model selection task, these histograms leverage the Minimum Description Length principle (MDL) to derive two different model selection criteria. Several proven theoretical results about these criteria give insights about their asymptotic behavior and are used to speed up their optimisation. These insights, combined to a greedy search heuristic, are used to construct histograms in linearithmic time rather than the polynomial time incurred by previous works. The capabilities of the proposed MDL density estimation method are illustrated with reference to other fully automated methods in the literature, both on synthetic and large real-world data sets.
MDL-based Compressing Sequential Rules
Chen, Xinhong, Gan, Wensheng, Wan, Shicheng, Gu, Tianlong
Nowadays, with the rapid development of the Internet, the era of big data has come. The Internet generates huge amounts of data every day. However, extracting meaningful information from massive data is like looking for a needle in a haystack. Data mining techniques can provide various feasible methods to solve this problem. At present, many sequential rule mining (SRM) algorithms are presented to find sequential rules in databases with sequential characteristics. These rules help people extract a lot of meaningful information from massive amounts of data. How can we achieve compression of mined results and reduce data size to save storage space and transmission time? Until now, there has been little research on the compression of SRM. In this paper, combined with the Minimum Description Length (MDL) principle and under the two metrics (support and confidence), we introduce the problem of compression of SRM and also propose a solution named ComSR for MDL-based compressing of sequential rules based on the designed sequential rule coding scheme. To our knowledge, we are the first to use sequential rules to encode an entire database. A heuristic method is proposed to find a set of compact and meaningful sequential rules as much as possible. ComSR has two trade-off algorithms, ComSR_non and ComSR_ful, based on whether the database can be completely compressed. Experiments done on a real dataset with different thresholds show that a set of compact and meaningful sequential rules can be found. This shows that the proposed method works.
The One-Inclusion Graph Algorithm is not Always Optimal
Aden-Ali, Ishaq, Cherapanamjeri, Yeshwanth, Shetty, Abhishek, Zhivotovskiy, Nikita
The one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth achieves an optimal in-expectation risk bound in the standard PAC classification setup. In one of the first COLT open problems, Warmuth conjectured that this prediction strategy always implies an optimal high probability bound on the risk, and hence is also an optimal PAC algorithm. We refute this conjecture in the strongest sense: for any practically interesting Vapnik-Chervonenkis class, we provide an in-expectation optimal one-inclusion graph algorithm whose high probability risk bound cannot go beyond that implied by Markov's inequality. Our construction of these poorly performing one-inclusion graph algorithms uses Varshamov-Tenengolts error correcting codes. Our negative result has several implications. First, it shows that the same poor high-probability performance is inherited by several recent prediction strategies based on generalizations of the one-inclusion graph algorithm. Second, our analysis shows yet another statistical problem that enjoys an estimator that is provably optimal in expectation via a leave-one-out argument, but fails in the high-probability regime. This discrepancy occurs despite the boundedness of the binary loss for which arguments based on concentration inequalities often provide sharp high probability risk bounds.
Towards Efficient and Domain-Agnostic Evasion Attack with High-dimensional Categorical Inputs
Bao, Hongyan, Han, Yufei, Zhou, Yujun, Gao, Xin, Zhang, Xiangliang
Our work targets at searching feasible adversarial perturbation to attack a classifier with high-dimensional categorical inputs in a domain-agnostic setting. This is intrinsically an NP-hard knapsack problem where the exploration space becomes explosively larger as the feature dimension increases. Without the help of domain knowledge, solving this problem via heuristic method, such as Branch-and-Bound, suffers from exponential complexity, yet can bring arbitrarily bad attack results. We address the challenge via the lens of multi-armed bandit based combinatorial search. Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming. Our objective is to achieve highly efficient and effective attack using an Orthogonal Matching Pursuit (OMP)-enhanced Upper Confidence Bound (UCB) exploration strategy. Our theoretical analysis bounding the regret gap of FEAT guarantees its practical attack performance. In empirical analysis, we compare FEAT with other state-of-the-art domain-agnostic attack methods over various real-world categorical data sets of different applications. Substantial experimental observations confirm the expected efficiency and attack effectiveness of FEAT applied in different application scenarios. Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.
Proofs and Certificates for Max-SAT
Py, Matthieu (a:1:{s:5:"en_US";s:29:"Aix-Marseille University, LIS";}) | Cherif, Mohamed Sami | Habet, Djamal
Current Max-SAT solvers are able to efficiently compute the optimal value of an input instance but they do not provide any certificate of its validity. In this paper, we present a tool, called MS-Builder, which generates certificates for the Max-SAT problem in the particular form of a sequence of equivalence-preserving transformations. To generate a certificate, MS-Builder iteratively calls a SAT oracle to get a SAT resolution refutation which is handled and adapted into a sound refutation for Max-SAT. In particular, we prove that the size of the computed Max-SAT refutation is linear with respect to the size of the initial refutation if it is semi-read-once, tree-like regular, tree-like or semi-tree-like. Additionally, we propose an extendable tool, called MS-Checker, able to verify the validity of any Max-SAT certificate using Max-SAT inference rules. Both tools are evaluated on the unweighted and weighted benchmark instances of the 2020 Max-SAT Evaluation.