AITopics | Clustering

Induction and deduction are two opposite operations in data-mining applications. Induction extracts knowledge in the form of, say, rules or decision trees from existing data, and deduction applies induction results to interpret new data. An intelligent learning database (ILDB) system integrates machine-learning techniques with database and knowledge base technology. It starts with existing database technology and performs both induction and deduction. The integration of database technology, induction (from machine learning), and deduction (from knowledge-based sys-tems) plays a key role in the construction of ILDB systems, as does the design of efficient induction and deduction algorithms. This article presents a system structure for ILDB systems and discusses practical issues for ILDB applications, such as instance selection and structured induction.

artificial intelligence, database, expert system, (17 more...)

AI Magazine

Country:

Europe > United Kingdom (0.68)
North America > United States > California > San Mateo County > Menlo Park (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Visualizing Group Structure

Held, Marcus, Puzicha, Jan, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1999

Cluster analysis is a fundamental principle in exploratory data analysis, providing the user with a description of the group structure ofgiven data. A key problem in this context is the interpretation andvisualization of clustering solutions in high-dimensional or abstract data spaces. In particular, probabilistic descriptions of the group structure, essential to capture inter-cluster relationships, arehardly assessable by simple inspection ofthe probabilistic assignment variables. VVe present a novel approach to the visualization ofgroup structure. It is based on a statistical model of the object assignments which have been observed or estimated by a probabilistic clustering procedure. The objects or data points are embedded in a low dimensional Euclidean space by approximating the observed data statistics with a Gaussian mixture model. The algorithm provides a new approach to the visualization of the inherent structurefor a broad variety of data types, e.g.

artificial intelligence, database, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Ohio (0.14)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

A Randomized Algorithm for Pairwise Clustering

Gdalyahu, Yoram, Weinshall, Daphna, Werman, Michael

Neural Information Processing SystemsDec-31-1999

We present a stochastic clustering algorithm based on pairwise similarity ofdatapoints. Our method extends existing deterministic methods, including agglomerative algorithms, min-cut graph algorithms, andconnected components. Thus it provides a common framework for all these methods. Our graph-based method differs from existing stochastic methods which are based on analogy to physical systems. The stochastic nature of our method makes it more robust against noise, including accidental edges and small spurious clusters. We demonstrate the superiority of our algorithm using an example with 3 spiraling bands and a lot of noise. 1 Introduction Clustering algorithms can be divided into two categories: those that require a vectorial representationof the data, and those which use only pairwise representation. In the former case, every data item must be represented as a vector in a real normed space, while in the second case only pairwise relations of similarity or dissimilarity areused.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Visualizing Group Structure

Held, Marcus, Puzicha, Jan, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1999

Cluster analysis is a fundamental principle in exploratory data analysis, providing the user with a description of the group structure of given data. A key problem in this context is the interpretation and visualization of clustering solutions in high-dimensional or abstract data spaces. In particular, probabilistic descriptions of the group structure, essential to capture inter-cluster relationships, are hardly assessable by simple inspection ofthe probabilistic assignment variables. VVe present a novel approach to the visualization of group structure. It is based on a statistical model of the object assignments which have been observed or estimated by a probabilistic clustering procedure. The objects or data points are embedded in a low dimensional Euclidean space by approximating the observed data statistics with a Gaussian mixture model. The algorithm provides a new approach to the visualization of the inherent structure for a broad variety of data types, e.g.

artificial intelligence, group structure, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Ohio (0.14)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

A Randomized Algorithm for Pairwise Clustering

Gdalyahu, Yoram, Weinshall, Daphna, Werman, Michael

Neural Information Processing SystemsDec-31-1999

We present a stochastic clustering algorithm based on pairwise similarity of datapoints. Our method extends existing deterministic methods, including agglomerative algorithms, min-cut graph algorithms, and connected components. Thus it provides a common framework for all these methods. Our graph-based method differs from existing stochastic methods which are based on analogy to physical systems. The stochastic nature of our method makes it more robust against noise, including accidental edges and small spurious clusters. We demonstrate the superiority of our algorithm using an example with 3 spiraling bands and a lot of noise. 1 Introduction Clustering algorithms can be divided into two categories: those that require a vectorial representation of the data, and those which use only pairwise representation. In the former case, every data item must be represented as a vector in a real normed space, while in the second case only pairwise relations of similarity or dissimilarity are used.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Active Data Clustering

Hofmann, Thomas, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1998

Active data clustering is a novel technique for clustering of proximity datawhich utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The proposed activedata sampling strategy is based on the expected value of information, a concept rooting in statistical decision theory. This is considered to be an important step towards the analysis of largescale datasets, because it offers a way to overcome the inherent data sparseness of proximity data.

artificial intelligence, dissimilarity, optimization problem, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis

Held, Marcus, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1998

An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled and unclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] signments, which maps the discrete optimization problem, i.e. how to determine the data as via the Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning of Decision Trees for Data Analysis 515 timation problem.

computer based training, decision tree learning, prototype, (20 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.14)
Asia > Middle East (0.14)

Genre: Instructional Material > Online (0.41)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Add feedback

Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis

Held, Marcus, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1998

An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled and unclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] signments, which maps the discrete optimization problem, i.e. how to determine the data as via the Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning of Decision Trees for Data Analysis 515 timation problem.

computer based training, decision tree learning, prototype, (20 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.14)
Asia > Middle East (0.14)

Genre: Instructional Material > Online (0.41)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Add feedback

Agnostic Classification of Markovian Sequences

El-Yaniv, Ran, Fine, Shai, Tishby, Naftali

Neural Information Processing SystemsDec-31-1998

Classification of finite sequences without explicit knowledge of their statistical nature is a fundamental problem with many important applications. We propose a new information theoretic approach to this problem which is based on the following ingredients: (i) sequences are similar when they are likely to be generated by the same source; (ii) cross entropies can be estimated via "universal compression"; (iii) Markovian sequences can be asymptotically-optimally merged. With these ingredients we design a method for the classification of discrete sequences whenever they can be compressed. We introduce the method and illustrate its application for hierarchical clustering of languages and for estimating similarities of protein sequences.

artificial intelligence, health & medicine, sequence, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.15)
Asia > Middle East > Israel (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis

Held, Marcus, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1998

An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled andunclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] which maps the discrete optimization problem, i.e. how to determine the data assignments, viathe Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning ofDecision Trees for Data Analysis 515 timation problem.

computer based training, decision tree learning, prototype, (20 more...)

Neural Information Processing Systems

Country: