mvset
Approximating Hierarchical MV-sets for Hierarchical Clustering
Glazer, Assaf, Weissbrod, Omer, Lindenbaum, Michael, Markovitch, Shaul
The goal of hierarchical clustering is to construct a cluster tree, which can be viewed as the modal structure of a density. For this purpose, we use a convex optimization program that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. We further extend existing graph-based methods to approximate the cluster tree of a distribution. By avoiding direct density estimation, our method is able to handle high-dimensional data more efficiently than existing density-based approaches. We present empirical results that demonstrate the superiority of our method over existing ones.
- North America > United States > Utah (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > Scotland (0.04)
- (7 more...)
q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
Glazer, Assaf, Lindenbaum, Michael, Markovitch, Shaul
In this paper we introduce a novel method that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. Our method can be regarded as a natural extension of the one-class SVM (OCSVM) algorithm that finds multiple parallel separating hyperplanes in a reproducing kernel Hilbert space. We call our method q-OCSVM, as it can be used to estimate $q$ quantiles of a high-dimensional distribution. For this purpose, we introduce a new global convex optimization program that finds all estimated sets at once and show that it can be solved efficiently. We prove the correctness of our method and present empirical results that demonstrate its superiority over existing methods.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data
Glazer, Assaf, Lindenbaum, Michael, Markovitch, Shaul
We propose an efficient, generalized, nonparametric, statistical Kolmogorov-Smirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods.
- Asia > Middle East > Israel (0.15)
- North America > United States (0.14)
Learning Minimum Volume Sets
Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies andconstructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µis assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > New York (0.05)
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Learning Minimum Volume Sets
Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules.
- North America > United States > New York (0.05)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Learning Minimum Volume Sets
Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules.
- North America > United States > New York (0.05)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)