Goto

Collaborating Authors

 Györfi, László


On rate-optimal classification from non-private and from private data

arXiv.org Machine Learning

In this paper we revisit the classical problem of classification, but impose privacy constraints. Under such constraints, the raw data $(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed, and all classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. The statistician is free to choose the form of this privacy mechanism, and here we add Laplace distributed noise to a discretisation of the location of each feature vector $X_i$ and to its label $Y_i$. The classification rule is the privatized version of the well-studied partitioning classification rule. In addition to the standard Lipschitz and margin conditions, a novel characteristic is introduced, by which the exact rate of convergence of the classification error probability is calculated, both for non-private and private data.


Lossless Transformations and Excess Risk Bounds in Statistical Inference

arXiv.org Machine Learning

We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss in estimating a random variable from an observed feature vector and the minimum expected loss in estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless and show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a delta-lossless transformation and give sufficient conditions for a given transformation to be universally delta-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottleneck, and deep learning, are also surveyed.


Repeated Observations for Classification

arXiv.org Artificial Intelligence

We study the problem nonparametric classification with repeated observations. Let $\bX$ be the $d$ dimensional feature vector and let $Y$ denote the label taking values in $\{1,\dots ,M\}$. In contrast to usual setup with large sample size $n$ and relatively low dimension $d$, this paper deals with the situation, when instead of observing a single feature vector $\bX$ we are given $t$ repeated feature vectors $\bV_1,\dots ,\bV_t $. Some simple classification rules are presented such that the conditional error probabilities have exponential convergence rate of convergence as $t\to\infty$. In the analysis, we investigate particular models like robust detection by nominal densities, prototype classification, linear transformation, linear classification, scaling.


On the strong stability of ergodic iterations

arXiv.org Machine Learning

We revisit processes generated by iterated random functions driven by a stationary and ergodic sequence. Such a process is called strongly stable if a random initialization exists, for which the process is stationary and ergodic, and for any other initialization, the difference of the two processes converges to zero almost surely. Under some mild conditions on the corresponding recursive map, without any condition on the driving sequence, we show the strong stability of iterations. Several applications are surveyed such as stochastic approximation and queuing. Furthermore, new results are deduced for Langevin-type iterations with dependent noise and for multitype branching processes.


Tree density estimation

arXiv.org Machine Learning

A natural strategy for mitigating the curse of dimensionality in estimating probability distributions is to employ lowcomplexity family of approximation distributions. For discrete distributions, Chow and Liu [5] suggested a family of tree-based approximations and gave an efficient maximum-likelihood estimator based on Kruskal's optimal spanning tree algorithm [14]. We stress that this approach makes no structural assumptions about the sampling distribution, but rather constitutes a modeling choice. Consequently, in this paradigm, the goal is to approximate the optimal-tree distribution from the data, without any guarantees on how well the latter approximates the true sampling distribution. Extensions of the Chow-Liu approach to continuous distributions were studied by Bach and Jordan [1] and by Liu et al. [16] under various assumptions.


Strongly universally consistent nonparametric regression and classification with privatised data

arXiv.org Machine Learning

In recent years there has been a surge of interest in data analysis methodology that is able to achieve strong statistical performance without comprimising the privacy and security of individual data holders. This has often been driven by applications in modern technology, for example by Google (Erlingsson et al., 2014), Apple (Tang et al., 2017), and Microsoft (Ding et al., 2017), but the study goes at least as far back as Warner (1965) and is often used in more traditional fields of clinical trials (Vu and Slavkovic, 2009, Dankar and El Emam, 2013) and census data (Machanavajjhala et al., 2008, Dwork, 2019). While there has long been an awareness that sensitive data must be anonymised, it has become apparent only relatively recently that simply removing names and addresses is insufficient in many cases (e.g.


Universal consistency and rates of convergence of multiclass prototype algorithms in metric spaces

arXiv.org Machine Learning

We study universal consistency and convergence rates of simple nearest-neighbor prototype rules for the problem of multiclass classification in metric paces. We first show that a novel data-dependent partitioning rule, named Proto-NN, is universally consistent in any metric space that admits a universally consistent rule. Proto-NN is a significant simplification of OptiNet, a recently proposed compression-based algorithm that, to date, was the only algorithm known to be universally consistent in such a general setting. Practically, Proto-NN is simpler to implement and enjoys reduced computational complexity. We then proceed to study convergence rates of the excess error probability. We first obtain rates for the standard $k$-NN rule under a margin condition and a new generalized-Lipschitz condition. The latter is an extension of a recently proposed modified-Lipschitz condition from $\mathbb R^d$ to metric spaces. Similarly to the modified-Lipschitz condition, the new condition avoids any boundness assumptions on the data distribution. While obtaining rates for Proto-NN is left open, we show that a second prototype rule that hybridizes between $k$-NN and Proto-NN achieves the same rates as $k$-NN while enjoying similar computational advantages as Proto-NN. We conjecture however that, as $k$-NN, this hybrid rule is not consistent in general.