AITopics | mvset

The goal of hierarchical clustering is to construct a cluster tree, which can be viewed as the modal structure of a density. For this purpose, we use a convex optimization program that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. We further extend existing graph-based methods to approximate the cluster tree of a distribution. By avoiding direct density estimation, our method is able to handle high-dimensional data more efficiently than existing density-based approaches. We present empirical results that demonstrate the superiority of our method over existing ones.

artificial intelligence, cluster tree, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions

Glazer, Assaf, Lindenbaum, Michael, Markovitch, Shaul

Neural Information Processing SystemsDec-31-2013

In this paper we introduce a novel method that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. Our method can be regarded as a natural extension of the one-class SVM (OCSVM) algorithm that finds multiple parallel separating hyperplanes in a reproducing kernel Hilbert space. We call our method q-OCSVM, as it can be used to estimate $q$ quantiles of a high-dimensional distribution. For this purpose, we introduce a new global convex optimization program that finds all estimated sets at once and show that it can be solved efficiently. We prove the correctness of our method and present empirical results that demonstrate its superiority over existing methods.

data mining, machine learning, mvset, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Glazer, Assaf, Lindenbaum, Michael, Markovitch, Shaul

Neural Information Processing SystemsDec-31-2012

We propose an efficient, generalized, nonparametric, statistical Kolmogorov-Smirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods.

artificial intelligence, ocsvm, upstream oil & gas, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.15)
North America > United States (0.14)

Genre: Research Report (0.69)

Industry: Energy > Oil & Gas > Upstream (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Learning Minimum Volume Sets

Scott, Clayton, Nowak, Robert

Neural Information Processing SystemsDec-31-2006

Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules.

classification, estimator, penalty, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Learning Minimum Volume Sets

Scott, Clayton, Nowak, Robert

Neural Information Processing SystemsDec-31-2006

Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules.

classification, estimator, penalty, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Learning Minimum Volume Sets

Scott, Clayton, Nowak, Robert

Neural Information Processing SystemsDec-31-2006

Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies andconstructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µis assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules.

artificial intelligence, classification, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback