Goto

Collaborating Authors

 Computational Learning Theory


A Strongly Polynomial Algorithm for Approximate Forster Transforms and its Application to Halfspace Learning

arXiv.org Artificial Intelligence

The Forster transform is a method of regularizing a dataset X (in particular, by placing it in radial isotropic position) while maintaining some of its essential properties. Forster transforms have been an essential tool in a diverse range of settings, including functional analysis [Bar98, GGdOW17], communication complexity [For02], coding theory [DSW17], mixed determinant/volume approximation [GS02], learning theory [HM13, HKLM20, DKT21, DPT21] and the Paulsen problem in frame theory [KLLR18, HM19]. The reader is referred to [AKS20] for a more detailed discussion. Known algorithms for computing (approximate) Forster transforms [HM13, AKS20, DKT21] rely on black-box convex optimization (e.g., the ellipsoid algorithm) and consequently have weakly polynomial runtimes. Here we study the question of whether Forster transforms can be computed in strongly polynomial time. We then leverage Forster transforms for the problem of PAC learning halfspaces (both in the realizable setting and in the presence of semi-random label noise). Intuitively speaking, a Forster transform is a mapping that turns a dataset into one with good anti-concentration properties.


Efficient PAC Learning from the Crowd with Pairwise Comparisons

arXiv.org Artificial Intelligence

We study crowdsourced PAC learning of threshold functions, where the labels are gathered from a pool of annotators some of whom may behave adversarially. This is yet a challenging problem and until recently has computationally and query efficient PAC learning algorithm been established by Awasthi et al. (2017). In this paper, we show that by leveraging the more easily acquired pairwise comparison queries, it is possible to exponentially reduce the label complexity while retaining the overall query complexity and runtime. Our main algorithmic contributions are a comparison-equipped labeling scheme that can faithfully recover the true labels of a small set of instances, and a label-efficient filtering process that in conjunction with the small labeled set can reliably infer the true labels of a large instance set.


List-Decodable Sparse Mean Estimation

arXiv.org Artificial Intelligence

Robust mean estimation is one of the most important problems in statistics: given a set of samples in $\mathbb{R}^d$ where an $\alpha$ fraction are drawn from some distribution $D$ and the rest are adversarially corrupted, we aim to estimate the mean of $D$. A surge of recent research interest has been focusing on the list-decodable setting where $\alpha \in (0, \frac12]$, and the goal is to output a finite number of estimates among which at least one approximates the target mean. In this paper, we consider that the underlying distribution $D$ is Gaussian with $k$-sparse mean. Our main contribution is the first polynomial-time algorithm that enjoys sample complexity $O\big(\mathrm{poly}(k, \log d)\big)$, i.e. poly-logarithmic in the dimension. One of our core algorithmic ingredients is using low-degree sparse polynomials to filter outliers, which may find more applications.


Drones-aided Asset Maintenance in Hospitals

arXiv.org Artificial Intelligence

The rapid outbreak of COVID-19 pandemic invoked scientists and researchers to prepare the world for future disasters. During the pandemic, global authorities on healthcare urged the importance of disinfection of objects and surfaces. To implement efficient and safe disinfection services during the pandemic, robots have been utilized for indoor assets. In this paper, we envision the use of drones for disinfection of outdoor assets in hospitals and other facilities. Such heterogeneous assets may have different service demands (e.g., service time, quantity of the disinfectant material etc.), whereas drones have typically limited capacity (i.e., travel time, disinfectant carrying capacity). To serve all the facility assets in an efficient manner, the drone to assets allocation and drone travel routes must be optimized. In this paper, we formulate the capacitated vehicle routing problem (CVRP) to find optimal route for each drone such that the total service time is minimized, while simultaneously the drones meet the demands of each asset allocated to it. The problem is solved using mixed integer programming (MIP). As CVRP is an NP-hard problem, we propose a lightweight heuristic to achieve sub-optimal performance while reducing the time complexity in solving the problem involving a large number of assets.


Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean Estimation

arXiv.org Artificial Intelligence

We establish a simple connection between robust and differentially-private algorithms: private mechanisms which perform well with very high probability are automatically robust in the sense that they retain accuracy even if a constant fraction of the samples they receive are adversarially corrupted. Since optimal mechanisms typically achieve these high success probabilities, our results imply that optimal private mechanisms for many basic statistics problems are robust. We investigate the consequences of this observation for both algorithms and computational complexity across different statistical problems. Assuming the Brennan-Bresler secret-leakage planted clique conjecture, we demonstrate a fundamental tradeoff between computational efficiency, privacy leakage, and success probability for sparse mean estimation. Private algorithms which match this tradeoff are not yet known -- we achieve that (up to polylogarithmic factors) in a polynomially-large range of parameters via the Sum-of-Squares method. To establish an information-computation gap for private sparse mean estimation, we also design new (exponential-time) mechanisms using fewer samples than efficient algorithms must use. Finally, we give evidence for privacy-induced information-computation gaps for several other statistics and learning problems, including PAC learning parity functions and estimation of the mean of a multivariate Gaussian.


Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions

arXiv.org Artificial Intelligence

We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $\mu$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $\mu$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $\tau$, having an additive $\log(1/\tau)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.


Optimal Weak to Strong Learning

arXiv.org Artificial Intelligence

The field of boosting has been started from a classic question in learning theory asking whether classifiers that are just slightly better than random guessing can be used to create a classifier with arbitrarily high accuracy when given enough training data. This question was initially asked by Kearns and Valiant [15, 16] and ignited the line of research that eventually lead to the development of AdaBoost [7], the prototype boosting algorithm to date. AdaBoost carefully combines the predictions of several inaccurate classifiers trained with a focus on different parts of the training data to come up with a voting classifier that performs well everywhere. We quantify the performance of an inaccurate learner by its advantage over random guessing. Said loosely, a -weak learner will correctly classify new data points with probability at least 1/2+. In contrast, given 0 <, < 1 and enough training data a strong learner outputs with probability 1 over the choice of the training data and possible random choices of the algorithm a hypothesis that correctly classifies new data points with probability at least 1 .


Machine Learning For Data Science Using MATLAB

#artificialintelligence

MATLAB is a widely used programming language for statistical computing. This course is for you if you want to have a real feel of the Machine Learning techniques without having to learn all the complicated maths. Additionally, this course is also for you if you have had previous hours and hours of machine learning theory but could never got a change or figure out how to implement and solve data science problems with it. The approach in this course is very practical and we will start everything from very scratch. We will immediately start coding after a couple of introductory tutorials and we try to keep the theory to bare minimal.


AlphaSnake: Policy Iteration on a Nondeterministic NP-hard Markov Decision Process

arXiv.org Artificial Intelligence

Reinforcement learning has recently been used to approach well-known NP-hard combinatorial problems in graph theory. Among these problems, Hamiltonian cycle problems are exceptionally difficult to analyze, even when restricted to individual instances of structurally complex graphs. In this paper, we use Monte Carlo Tree Search (MCTS), the search algorithm behind many state-of-the-art reinforcement learning algorithms such as AlphaZero, to create autonomous agents that learn to play the game of Snake, a game centered on properties of Hamiltonian cycles on grid graphs. The game of Snake can be formulated as a single-player discounted Markov Decision Process (MDP) where the agent must behave optimally in a stochastic environment. Determining the optimal policy for Snake, defined as the policy that maximizes the probability of winning - or win rate - with higher priority and minimizes the expected number of time steps to win with lower priority, is conjectured to be NP-hard. Performance-wise, compared to prior work in the Snake game, our algorithm is the first to achieve a win rate over $0.5$ (a uniform random policy achieves a win rate $< 2.57 \times 10^{-15}$), demonstrating the versatility of AlphaZero in approaching NP-hard environments.


Comparative Learning: A Sample Complexity Theory for Two Hypothesis Classes

arXiv.org Artificial Intelligence

In many learning theory problems, a central role is played by a hypothesis class: we might assume that the data is labeled according to a hypothesis in the class (usually referred to as the realizable setting), or we might evaluate the learned model by comparing it with the best hypothesis in the class (the agnostic setting). Taking a step beyond these classic setups that involve only a single hypothesis class, we introduce comparative learning as a combination of the realizable and agnostic settings in PAC learning: given two binary hypothesis classes $S$ and $B$, we assume that the data is labeled according to a hypothesis in the source class $S$ and require the learned model to achieve an accuracy comparable to the best hypothesis in the benchmark class $B$. Even when both $S$ and $B$ have infinite VC dimensions, comparative learning can still have a small sample complexity. We show that the sample complexity of comparative learning is characterized by the mutual VC dimension $\mathsf{VC}(S,B)$ which we define to be the maximum size of a subset shattered by both $S$ and $B$. We also show a similar result in the online setting, where we give a regret characterization in terms of the mutual Littlestone dimension $\mathsf{Ldim}(S,B)$. These results also hold for partial hypotheses. We additionally show that the insights necessary to characterize the sample complexity of comparative learning can be applied to characterize the sample complexity of realizable multiaccuracy and multicalibration using the mutual fat-shattering dimension, an analogue of the mutual VC dimension for real-valued hypotheses. This not only solves an open problem proposed by Hu, Peale, Reingold (2022), but also leads to independently interesting results extending classic ones about regression, boosting, and covering number to our two-hypothesis-class setting.