AITopics

2410.02835

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.34)

arXiv.org Artificial IntelligenceFeb-13-2024

Distribution Estimation under the Infinity Norm

Kontorovich, Aryeh, Painsky, Amichai

We present novel bounds for estimating discrete probability distributions under the $\ell_\infty$ norm. These are nearly optimal in various precise senses, including a kind of instance-optimality. Our data-dependent convergence guarantees for the maximum likelihood estimator significantly improve upon the currently known results. A variety of techniques are utilized and innovated upon, including Chernoff-type inequalities and empirical Bernstein bounds. We illustrate our results in synthetic and real-world experiments. Finally, we apply our proposed framework to a basic selective inference problem, where we estimate the most frequent probabilities in a sample.

artificial intelligence, inequality, machine learning, (19 more...)

2402.08422

Country:

North America > United States (0.46)
Europe (0.46)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.84)

Industry: Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Machine LearningOct-30-2023

Near-optimal learning with average H\"older smoothness

Hanneke, Steve, Kontorovich, Aryeh, Kornowski, Guy

We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to H\"older smoothness. This measure of the "effective smoothness" of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic "worst-case" H\"older constant. We consider both the realizable and the agnostic (noisy) regression settings, proving upper and lower risk bounds in terms of the average H\"older smoothness; these rates improve upon both previously known rates even in the special case of average Lipschitz smoothness. Moreover, our lower bound is tight in the realizable setting up to log factors, thus we establish the minimax rate. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown underlying distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide distinct learning algorithms that achieve both (nearly) optimal learning rates. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of H\"older smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.

artificial intelligence, machine learning, smoothness, (17 more...)

2302.06005

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Artificial IntelligenceOct-3-2023

Splitting the Difference on Adversarial Training

Levi, Matan, Kontorovich, Aryeh

The existence of adversarial examples points to a basic weakness of deep neural networks. One of the most effective defenses against such examples, adversarial training, entails training models with some degree of robustness, usually at the expense of a degraded natural accuracy. Most adversarial training methods aim to learn a model that finds, for each class, a common decision boundary encompassing both the clean and perturbed examples. In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned, effectively splitting each class into two classes: "clean" and "adversarial." This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries. We provide a theoretical plausibility argument that sheds some light on the conditions under which our approach can be expected to be beneficial. Likewise, we empirically demonstrate that our method learns robust models while attaining optimal or near-optimal natural accuracy, e.g., on CIFAR-10 we obtain near-optimal natural accuracy of $95.01\%$ alongside significant robustness across multiple tasks. The ability to achieve such near-optimal natural accuracy, while maintaining a significant level of robustness, makes our method applicable to real-world applications where natural accuracy is at a premium. As a whole, our main contribution is a general method that confers a significant level of robustness upon classifiers with only minor or negligible degradation of their natural accuracy.

adversarial training, artificial intelligence, machine learning, (2 more...)

2310.0248

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

arXiv.org Machine LearningSep-29-2023

Efficient Agnostic Learning with Average Smoothness

Hanneke, Steve, Kontorovich, Aryeh, Kornowski, Guy

We study distribution-free nonparametric regression following a notion of average smoothness initiated by Ashlagi et al. (2021), which measures the "effective" smoothness of a function with respect to an arbitrary unknown underlying distribution. While the recent work of Hanneke et al. (2023) established tight uniform convergence bounds for average-smooth functions in the realizable case and provided a computationally efficient realizable learning algorithm, both of these results currently lack analogs in the general agnostic (i.e. noisy) case. In this work, we fully close these gaps. First, we provide a distribution-free uniform convergence bound for average-smoothness classes in the agnostic setting. Second, we match the derived sample complexity with a computationally efficient agnostic learning algorithm. Our results, which are stated in terms of the intrinsic geometry of the data and hold over any totally bounded metric space, show that the guarantees recently obtained for realizable learning of average-smooth functions transfer to the agnostic setting. At the heart of our proof, we establish the uniform convergence rate of a function class in terms of its bracketing entropy, which may be of independent interest.

algorithm, artificial intelligence, machine learning, (17 more...)

2309.17016

Country: Europe > France (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceDec-8-2022

Differentially-Private Bayes Consistency

Bousquet, Olivier, Kaplan, Haim, Kontorovich, Aryeh, Mansour, Yishay, Moran, Shay, Sadigurschi, Menachem, Stemmer, Uri

We construct a universally Bayes consistent learning rule that satisfies differential privacy (DP). We first handle the setting of binary classification and then extend our rule to the more general setting of density estimation (with respect to the total variation metric). The existence of a universally consistent DP learner reveals a stark difference with the distribution-free PAC model. Indeed, in the latter DP learning is extremely limited: even one-dimensional linear classifiers are not privately learnable in this stringent model. Our result thus demonstrates that by allowing the learning rate to depend on the target distribution, one can circumvent the above-mentioned impossibility result and in fact, learn \emph{arbitrary} distributions by a single DP algorithm. As an application, we prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal \emph{labeled} sample complexity of $\tilde{O}(d/\varepsilon)$ labeled examples (and with an unlabeled sample complexity that can depend on the target distribution).

algorithm, artificial intelligence, machine learning, (12 more...)

2212.04216

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (0.70)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningFeb-7-2022

Metric-valued regression

Cohen, Dan Tsir, Kontorovich, Aryeh

We propose an efficient algorithm for learning mappings between two metric spaces, $\X$ and $\Y$. Our procedure is strongly Bayes-consistent whenever $\X$ and $\Y$ are topologically separable and $\Y$ is "bounded in expectation" (our term; the separability assumption can be somewhat weakened). At this level of generality, ours is the first such learnability result for unbounded loss in the agnostic setting. Our technique is based on metric medoids (a variant of Fr\'echet means) and presents a significant departure from existing methods, which, as we demonstrate, fail to achieve Bayes-consistency on general instance- and label-space metrics. Our proofs introduce the technique of {\em semi-stable compression}, which may be of independent interest.

artificial intelligence, machine learning, metric space, (16 more...)

2202.03045

Country:

North America > United States > Colorado (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

arXiv.org Machine LearningDec-17-2021

Tree density estimation

Györfi, László, Kontorovich, Aryeh, Weiss, Roi

A natural strategy for mitigating the curse of dimensionality in estimating probability distributions is to employ lowcomplexity family of approximation distributions. For discrete distributions, Chow and Liu [5] suggested a family of tree-based approximations and gave an efficient maximum-likelihood estimator based on Kruskal's optimal spanning tree algorithm [14]. We stress that this approach makes no structural assumptions about the sampling distribution, but rather constitutes a modeling choice. Consequently, in this paradigm, the goal is to approximate the optimal-tree distribution from the data, without any guarantees on how well the latter approximates the true sampling distribution. Extensions of the Chow-Liu approach to continuous distributions were studied by Bach and Jordan [1] and by Liu et al. [16] under various assumptions.

artificial intelligence, machine learning, mutual information, (17 more...)

2111.11971

Country:

Europe (0.46)
Asia > Middle East > Israel (0.28)
Asia > Middle East > Jordan (0.24)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.48)

arXiv.org Machine LearningOct-10-2021

Fat-shattering dimension of $k$-fold maxima

Kontorovich, Aryeh, Attias, Idan

We provide improved estimates on the fat-shattering dimension of the $k$-fold maximum of real-valued function classes. The latter consists of all ways of choosing $k$ functions, one from each of the $k$ classes, and computing their pointwise maximum. The bound is stated in terms of the fat-shattering dimensions of the component classes. For linear and affine function classes, we provide a considerably sharper upper bound and a matching lower bound, achieving, in particular, an optimal dependence on $k$. Along the way, we point out and correct a number of erroneous claims in the literature.

artificial intelligence, dimension, machine learning, (16 more...)

2110.04763

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningNov-9-2020

Stable Sample Compression Schemes: New Applications and an Optimal SVM Margin Bound

Hanneke, Steve, Kontorovich, Aryeh

We analyze a family of supervised learning algorithms based on sample compression schemes that are stable, in the sense that removing points from the training set which were not selected for the compression set does not alter the resulting classifier. We use this technique to derive a variety of novel or improved data-dependent generalization bounds for several learning algorithms. In particular, we prove a new margin bound for SVM, removing a log factor. The new bound is provably optimal.

artificial intelligence, compression scheme, machine learning, (16 more...)

2011.04586

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)