Computational Learning Theory
Detecting Hierarchical Changes in Latent Variable Models
Fukushima, Shintaro, Yamanishi, Kenji
This paper addresses the issue of detecting hierarchical changes in latent variable models (HCDL) from data streams. There are three different levels of changes for latent variable models: 1) the first level is the change in data distribution for fixed latent variables, 2) the second one is that in the distribution over latent variables, and 3) the third one is that in the number of latent variables. It is important to detect these changes because we can analyze the causes of changes by identifying which level a change comes from (change interpretability). This paper proposes an information-theoretic framework for detecting changes of the three levels in a hierarchical way. The key idea to realize it is to employ the MDL (minimum description length) change statistics for measuring the degree of change, in combination with DNML (decomposed normalized maximum likelihood) code-length calculation. We give a theoretical basis for making reliable alarms for changes. Focusing on stochastic block models, we employ synthetic and benchmark datasets to empirically demonstrate the effectiveness of our framework in terms of change interpretability as well as change detection.
You can't eliminate bias from machine learning, but you can pick your bias
Bias is a major topic of concern in mainstream society, which has embraced the concept that certain characteristics -- race, gender, age, or zip code, for example -- should not matter when making decisions about things such as credit or insurance. But while an absence of bias makes sense on a human level, in the world of machine learning, it's a bit different. In machine learning theory, if you can mathematically prove you don't have any bias and if you find the optimal model, the value of the model actually diminishes because you will not be able to make generalizations. What this tells us is that, as unfortunate as it may sound, without any bias built into the model, you cannot learn. Modern businesses want to use machine learning and data mining to make decisions based on what their data tells them, but the very nature of that inquiry is discriminatory.
Noise in Classification
Balcan, Maria-Florina, Haghtalab, Nika
Machine learning studies automatic methods for making accurate predictions and useful decisions based on previous observations and experience. From the application point of view, machine learning has become a successful discipline for operating in complex domains such as natural language processing, speech recognition, and computer vision. Moreover, the theoretical foundations of machine learning have led to the development of powerful and versatile techniques, which are routinely used in a wide range of commercial systems in today's world. However, a major challenge of increasing importance in the theory and practice of machine learning is to provide algorithms that are robust to adversarial noise. In this chapter, we focus on classification where the goal is to learn a classification rule from labeled examples only.
Stable Sample Compression Schemes: New Applications and an Optimal SVM Margin Bound
Hanneke, Steve, Kontorovich, Aryeh
We analyze a family of supervised learning algorithms based on sample compression schemes that are stable, in the sense that removing points from the training set which were not selected for the compression set does not alter the resulting classifier. We use this technique to derive a variety of novel or improved data-dependent generalization bounds for several learning algorithms. In particular, we prove a new margin bound for SVM, removing a log factor. The new bound is provably optimal.
A Theory of Universal Learning
Bousquet, Olivier, Hanneke, Steve, Moran, Shay, van Handel, Ramon, Yehudayoff, Amir
How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy. In this paper, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this paper is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. For concreteness, we consider in this paper only the realizable case, though analogous results are expected to extend to more general learning scenarios.
Maximizing Welfare with Incentive-Aware Evaluation Mechanisms
Haghtalab, Nika, Immorlica, Nicole, Lucier, Brendan, Wang, Jack Z.
Motivated by applications such as college admission and insurance rate determination, we propose an evaluation problem where the inputs are controlled by strategic individuals who can modify their features at a cost. A learner can only partially observe the features, and aims to classify individuals with respect to a quality score. The goal is to design an evaluation mechanism that maximizes the overall quality score, i.e., welfare, in the population, taking any strategic updating into account. We further study the algorithmic aspect of finding the welfare maximizing evaluation mechanism under two specific settings in our model. When scores are linear and mechanisms use linear scoring rules on the observable features, we show that the optimal evaluation mechanism is an appropriate projection of the quality score. When mechanisms must use linear thresholds, we design a polynomial time algorithm with a (1/4)-approximation guarantee when the underlying feature distribution is sufficiently smooth and admits an oracle for finding dense regions. We extend our results to settings where the prior distribution is unknown and must be learned from samples.
Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity
Kaplan, Haim, Mansour, Yishay, Stemmer, Uri, Tsfadia, Eliad
We present a differentially private learner for halfspaces over a finite grid $G$ in $\mathbb{R}^d$ with sample complexity $\approx d^{2.5}\cdot 2^{\log^*|G|}$, which improves the state-of-the-art result of [Beimel et al., COLT 2019] by a $d^2$ factor. The building block for our learner is a new differentially private algorithm for approximately solving the linear feasibility problem: Given a feasible collection of $m$ linear constraints of the form $Ax\geq b$, the task is to privately identify a solution $x$ that satisfies most of the constraints. Our algorithm is iterative, where each iteration determines the next coordinate of the constructed solution $x$.
A Learning Theoretic Perspective on Local Explainability
Li, Jeffrey, Nagarajan, Vaishnavh, Plumb, Gregory, Talwalkar, Ameet
In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations. First, we tackle the traditional problem of performance generalization and bound the testtime accuracy of a model using a notion of how locally explainable it is. Second, we explore the novel problem of explanation generalization which is an important concern for a growing class of finite sample-based local approximation explanations. Finally, we validate our theoretical results empirically and show that they reflect what can be seen in practice. There has been a growing interest in interpretable machine learning, which seeks to help people understand their models. While interpretable machine learning encompasses a wide range of problems, it is a fairly uncontroversial hypothesis that there exists a tradeoff between a model's complexity and general notions of interpretability. This hypothesis suggests a seemingly natural connection to the field of learning theory, which has thoroughly explored relationships between a function class's complexity and generalization. However, formal connections between interpretability and learning theory remain relatively unstudied.
From cloud to device: The future of AI and machine learning on the Edge
For a more comprehensive survey, read our full paper on this topic. We are surrounded by smart devices: from mobile phones and watches to glasses, jewelry, and even clothes. But while these devices are small and powerful, they are merely the tip of a computing iceberg that starts at your fingertips and ends in giant data and compute centers across the world. Data is transmitted from devices to the cloud where it is used to train models that are then transmitted back to be deployed back on the device. Unless used for learning simple concepts like wake words or recognizing your face to unlock your phone, machine learning is computationally expensive and data has no choice but to travel these thousands of miles before it can be turned into useful information.
Measure Theoretic Approach to Nonuniform Learnability
An earlier introduced characterization of nonuniform learnability that allows the sample size to depend on the hypothesis to which the learner is compared has been redefined using the measure-theoretic approach. Where nonuniform learnability is a strict relaxation of the Probably Approximately Correct (PAC) framework. Introduction of a new algorithm, Generalize Measure Learnability framework (GML), to implement this approach with the study of its sample and computational complexity bounds. Like the Minimum Description Length (MDL) principle, this approach can be regarded as an explication of Occam's razor. Furthermore, many situations were presented (Hypothesis Classes that are countable where we can apply the GML framework) which we can learn to use the GML scheme and can achieve statistical consistency.