AITopics | halfspace depth

Collaborating Authors

halfspace depth

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Depth as a Risk

Castellanos, Arturo, Mozharovskyi, Pavlo

arXiv.org Machine LearningJul-14-2025

Data depths are score functions that quantify in an unsupervised fashion how central is a point inside a distribution, with numerous applications such as anomaly detection, multivariate or functional data analysis, arising across various fields. The halfspace depth was the first depth to aim at generalising the notion of quantile beyond the univariate case. Among the existing variety of depth definitions, it remains one of the most used notions of data depth. Taking a different angle from the quantile point of view, we show that the halfspace depth can also be regarded as the minimum loss of a set of classifiers for a specific labelling of the points. By changing the loss or the set of classifiers considered, this new angle naturally leads to a family of "loss depths", extending to well-studied classifiers such as, e.g., SVM or logistic regression, among others. This framework directly inherits computational efficiency of existing machine learning algorithms as well as their fast statistical convergence rates, and opens the data depth realm to the high-dimensional setting. Furthermore, the new loss depths highlight a connection between the dataset and the right amount of complexity or simplicity of the classifiers. The simplicity of classifiers as well as the interpretation as a risk makes our new kind of data depth easy to explain, yet efficient for anomaly detection, as is shown by experiments.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2507.08518

Country:

North America > United States > North Carolina > Watauga County > Boone (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Concentration of the exponential mechanism and differentially private multivariate medians

Ramsay, Kelly, Jagannath, Aukosh, Chenouri, Shoja'eddin

arXiv.org Artificial IntelligenceOct-12-2022

We prove concentration inequalities for the output of the exponential mechanism about the maximizer of the population objective function. This bound applies to objective functions that satisfy a mild regularity condition. To illustrate our result, we study the problem of differentially private multivariate median estimation. We present novel finite-sample performance guarantees for differentially private multivariate depth-based medians which are essentially sharp. Our results cover commonly used depth functions, such as the halfspace (or Tukey) depth, spatial depth, and the integrated dual depth. We show that under Cauchy marginals, the cost of heavy-tailed location estimation outweighs the cost of privacy. We demonstrate our results numerically using a Gaussian contamination model in dimensions up to $d = 100$, and compare them to a state-of-the-art private mean estimation algorithm.

artificial intelligence, depth function, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2210.06459

Country:

North America > Canada (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Security & Privacy (0.68)

Add feedback

Differentially Private Estimation via Statistical Depth

Cumings-Menon, Ryan

arXiv.org Artificial IntelligenceJul-25-2022

Constructing a differentially private (DP) estimator requires deriving the maximum influence of an observation, which can be difficult in the absence of exogenous bounds on the input data or the estimator, especially in high dimensional settings. This paper shows that standard notions of statistical depth, i.e., halfspace depth and regression depth, are particularly advantageous in this regard, both in the sense that the maximum influence of a single observation is easy to analyze and that this value is typically low. This is used to motivate new approximate DP location and regression estimators using the maximizers of these two notions of statistical depth. A more computationally efficient variant of the approximate DP regression estimator is also provided. Also, to avoid requiring that users specify a priori bounds on the estimates and/or the observations, variants of these DP mechanisms are described that satisfy random differential privacy (RDP), which is a relaxation of differential privacy provided by Hall, Wasserman, and Rinaldo (2013). We also provide simulations of the two DP regression methods proposed here. The proposed estimators appear to perform favorably relative to the existing DP regression methods we consider in these simulations when either the sample size is at least 100-200 or the privacy-loss budget is sufficiently high.

estimator, mechanism, sensitivity, (16 more...)

arXiv.org Artificial Intelligence

2207.12602

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (0.92)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis

Staerman, Guillaume, Mozharovskyi, Pavlo, Clémençon, Stéphan

arXiv.org Machine LearningJun-21-2021

Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (\textit{e.g.} inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey75}, not all of them possess the properties desirable to emulate the notion of quantile function for univariate probability distributions. In this paper, we propose an extension of the \textit{integrated rank-weighted} statistical depth (IRW depth in abbreviated form) originally introduced in \cite{IRW}, modified in order to satisfy the property of \textit{affine-invariance}, fulfilling thus all the four key axioms listed in the nomenclature elaborated by \cite{ZuoS00a}. The variant we propose, referred to as the Affine-Invariant IRW depth (AI-IRW in short), involves the covariance/precision matrices of the (supposedly square integrable) $d$-dimensional random vector $X$ under study, in order to take into account the directions along which $X$ is most variable to assign a depth value to any point $x\in \mathbb{R}^d$. The accuracy of the sampling version of the AI-IRW depth is investigated from a nonasymptotic perspective. Namely, a concentration result for the statistical counterpart of the AI-IRW depth is proved. Beyond the theoretical analysis carried out, applications to anomaly detection are considered and numerical results are displayed, providing strong empirical evidence of the relevance of the depth function we propose here.

ai-irw, halfspace depth, matrix, (14 more...)

arXiv.org Machine Learning

2106.11068

Country: Europe > France (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Depth-based pseudo-metrics between probability distributions

Staerman, Guillaume, Mozharovskyi, Pavlo, Clémençon, Stéphan, d'Alché-Buc, Florence

arXiv.org Machine LearningMar-23-2021

Data depth is a non parametric statistical tool that measures centrality of any element $x\in\mathbb{R}^d$ with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Consequently, its upper level sets -- the depth-trimmed regions -- give rise to a definition of multivariate quantiles. In this work, we propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions. The first one is constructed as the Lp-distance between data depth w.r.t. each distribution while the second one relies on the Hausdorff distance between their quantile regions. It can further be seen as an original way to extend the one-dimensional formulae of the Wasserstein distance, which involves quantiles and cdfs, to the multivariate space. After discussing the properties of these pseudo-metrics and providing conditions under which they define a distance, we highlight similarities with the Wasserstein distance. Interestingly, the derived non-asymptotic bounds show that in contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality. Moreover, based on the support function of a convex body, we propose an efficient approximation possessing linear time complexity w.r.t. the size of the data set and its dimension. The quality of this approximation as well as the performance of the proposed approach are illustrated in experiments. Furthermore, by construction the regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails, a behavior witnessed in the numerical experiments.

data depth, probability distribution, submission and formatting instruction, (14 more...)

arXiv.org Machine Learning

2103.12711

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Depth and depth-based classification with R-package ddalpha

Pokotylo, Oleksii, Mozharovskyi, Pavlo, Dyckerhoff, Rainer

arXiv.org Machine LearningAug-14-2016

Following the seminal idea of Tukey, data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R-package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification. ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the $DD\alpha$-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition.

classification, classifier, ddalpha, (16 more...)

arXiv.org Machine Learning

1608.04109

Country:

North America > United States > New York (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback