Plotting

 Jalalzai, Hamid


Membership Inference Attacks via Adversarial Examples

arXiv.org Artificial Intelligence

The raise of machine learning and deep learning led to significant improvement in several domains. This change is supported by both the dramatic rise in computation power and the collection of large datasets. Such massive datasets often include personal data which can represent a threat to privacy. Membership inference attacks are a novel direction of research which aims at recovering training data used by a learning algorithm. In this paper, we develop a mean to measure the leakage of training data leveraging a quantity appearing as a proxy of the total variation of a trained model near its training samples. We extend our work by providing a novel defense mechanism. Our contributions are supported by empirical evidence through convincing numerical experiments.


Concentration bounds for the empirical angular measure with statistical learning applications

arXiv.org Machine Learning

The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation when the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and scale essentially as the square root of the effective sample size, up to a logarithmic factor. Discarding the most extreme observations yields a truncated version of the empirical angular measure for which the logarithmic factor in the concentration bound is replaced by a factor depending on the truncation level. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.


Informative Clusters for Multivariate Extremes

arXiv.org Machine Learning

Clustering is essential for exploratory data mining, data structure analysis and a common technique for statistical data analysis. It is widely used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. Many clustering approaches exist with different intrinsic notions of what a cluster is. In the standard setup, the goal is to group objects into subsets, known as clusters, such that objects within a given cluster are more related to one another than the ones from a different cluster. Clustering is already quite well-known (see [4, 27] and references therein) conversely to Extreme Value Theory (EVT) which is a newer field in the machine learning community that has been used in anomaly detection [14, 28, 45, 51], classification [31, 32, 54] or clustering [10, 12, 13, 33] when dedicated to the most extreme regions of the sample space.