AITopics

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsDec-31-2005

Detecting Significant Multidimensional Spatial Clusters

Neill, Daniel B., Moore, Andrew W., Pereira, Francisco, Mitchell, Tom M.

Each of these problems can be solved using a spatial scan statistic (Kulldorff, 1997), where we compute the maximum of a likelihood ratio statistic over all spatial regions, and find the significance of this region by randomization. However, computing the scan statistic for all spatial regions is generally computationally infeasible, so we introduce a novel fast spatial scan algorithm, generalizing the 2D scan algorithm of (Neill and Moore, 2004) to arbitrary dimensions. Our new multidimensional multiresolution algorithm allows us to find spatial clusters up to 1400x faster than the naive spatial scan, without any loss of accuracy.

health & medicine, neurology, scan statistic, (22 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.47)
Health & Medicine > Epidemiology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.69)

A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters

Neill, Daniel B., Moore, Andrew W.

Given an N N grid of squares, where each square has a count and an underlying population, our goal is to find the square region with the highest density, and to calculate its significance by randomization. Any density measure D, dependent on the total count and total population of a region, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff's spatial scan statistic D

algorithm, health & medicine, spatial reasoning, (18 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.46)

New Algorithms for Efficient High Dimensional Non-parametric Classification

liu, Ting, Moore, Andrew W., Gray, Alexander

This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.

artificial intelligence, machine learning, node, (15 more...)

Country:

North America > United States > California (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)

A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters

Neill, Daniel B., Moore, Andrew W.

Given an N N grid of squares, where each square has a count and an underlying population,our goal is to find the square region with the highest density, and to calculate its significance by randomization. Any density measure D, dependent on the total count and total population of a region, canbe used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff's spatial scan statistic D

algorithm, health & medicine, spatial reasoning, (18 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.46)

New Algorithms for Efficient High Dimensional Non-parametric Classification

liu, Ting, Moore, Andrew W., Gray, Alexander

This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, andwe investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phaseof SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.

artificial intelligence, machine learning, node, (13 more...)

Country:

North America > United States > California (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.57)

Neural Information Processing SystemsDec-31-2003

Using Tarjan's Red Rule for Fast Dependency Tree Construction

Pelleg, Dan, Moore, Andrew W.

We focus on the problem of efficient learning of dependency trees. It is well-known that given the pairwise mutual information coefficients, a minimum-weight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large data-sets it is the construction of the correlation matrix that dominates the running time. We have developed a new spanning-tree algorithm which is capable of exploiting partial knowledge about edge weights. The partial knowledge we maintain is a probabilistic confidence interval on the coefficients, which we derive by examining just a small sample of the data. The algorithm is able to flag the need to shrink an interval, which translates to inspection of more data for the particular attribute pair. Experimental results show running time that is near-constant in the number of records, without significant loss in accuracy of the generated trees. Interestingly, our spanning-tree algorithm is based solely on Tarjan's red-edge rule, which is generally considered a guaranteed recipe for bad performance.

algorithm, artificial intelligence, data mining, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Colorado (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Data Science > Data Mining (0.68)

Neural Information Processing SystemsDec-31-2003

Using Tarjan's Red Rule for Fast Dependency Tree Construction

Pelleg, Dan, Moore, Andrew W.

algorithm, artificial intelligence, data mining, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Colorado (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Data Science > Data Mining (0.68)

Neural Information Processing SystemsDec-31-2003

Using Tarjan's Red Rule for Fast Dependency Tree Construction

Pelleg, Dan, Moore, Andrew W.

We focus on the problem of efficient learning of dependency trees. It is well-known that given the pairwise mutual information coefficients, a minimum-weight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large data-sets it is the construction ofthe correlation matrix that dominates the running time. We have developed a new spanning-tree algorithm which is capable of exploiting partial knowledge about edge weights. The partial knowledge we maintain isa probabilistic confidence interval on the coefficients, which we derive by examining just a small sample of the data. The algorithm is able to flag the need to shrink an interval, which translates to inspection ofmore data for the particular attribute pair. Experimental results show running time that is near-constant in the number of records, without significantloss in accuracy of the generated trees. Interestingly, our spanning-tree algorithm is based solely on Tarjan's red-edge rule, which is generally considered a guaranteed recipe for bad performance.

algorithm, artificial intelligence, data mining, (16 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Colorado (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Data Science > Data Mining (0.68)

Neural Information Processing SystemsDec-31-2001

`N-Body' Problems in Statistical Learning

Gray, Alexander G., Moore, Andrew W.

We present efficient algorithms for all-point-pairs problems, or'Nbody'-like problems, which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection, and the two-point correlation.

algorithm, artificial intelligence, data mining, (18 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)