AITopics

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Goldberger, Jacob, Roweis, Sam T.

Hierarchical Clustering of a Mixture Model

Neural Information Processing SystemsDec-31-2005

Gaussians grouped together into a single Gaussian [1].

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Gaffney, Scott J., Smyth, Padhraic

Joint Probabilistic Curve Clustering and Alignment

Neural Information Processing SystemsDec-31-2005

Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space).The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithmsfor the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering andalignment of gene expression time-course data.

algorithm, artificial intelligence, machine learning, (14 more...)

Country: North America > United States > California > Orange County > Irvine (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Journal of Artificial Intelligence ResearchAug-1-2005

Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis

Cimiano, P., Hotho, A., Staab, S.

We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris' distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.

concept hierarchy, ontology, proceedings, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1648

AI Access Foundation

10421

Journal of Artificial Intelligence Research

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
(2 more...)

Genre: Research Report (0.88)

Industry: Banking & Finance > Trading (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Semantic Integration in Text: From Ambiguous Names to Identifiable Entities

Li, Xin, Morie, Paul, Roth, Dan

AI MagazineMar-15-2005

Semantic integration focuses on discovering, representing, and manipulating correspondences between entities in disparate data sources. The topic has been widely studied in the context of structured data, with problems being considered including ontology and schema matching, matching relational tuples, and reconciling inconsistent data values. In recent years, however, semantic integration over text has also received increasing attention. This article studies a key challenge in semantic integration over text: identifying whether different mentions of real-world entities, such as "JFK" and "John Kennedy," within and across natural language text documents, actually represent the same concept. We present a machine-learning study of this problem. The first approach is a discriminative approach -- a pairwise local classifier is trained in a supervised way to determine whether two given mentions represent the same real-world entity. This is followed, potentially, by a global clustering algorithm that uses the classifier as its similarity metric. Our second approach is a global generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes (1) a joint distribution over entities (for example, a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), and (2) an "author" model that assumes that at least one mention of an entity in a document is easily identifiable and then generates other mentions via (3) an "appearance" model that governs how mentions are transformed from the "representative" mention. We show that both approaches perform very accurately, in the range of 90-95 percent. F1 measure for different entity types, much better than previous approaches to some aspects of this problem. Finally, we discuss how our solution for mention matching in text can be potentially applied to matching relational tuples, as well as to linking entities across databases and text.

classifier, data mining, machine learning, (20 more...)

AI Magazine

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Sports (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
(2 more...)

Pairwise Clustering and Graphical Models

Shental, Noam, Zomet, Assaf, Hertz, Tomer, Weiss, Yair

Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However,spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using trainingdata. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation(BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical modelsto derive a learning algorithm for affinity matrices based on labeled data.

algorithm, artificial intelligence, inference, (16 more...)

Country: Asia > Middle East > Israel (0.16)

Industry: Energy > Oil & Gas (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Pairwise Clustering and Graphical Models

Shental, Noam, Zomet, Assaf, Hertz, Tomer, Weiss, Yair

Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data.

artificial intelligence, correlation, machine learning, (16 more...)

Country: Asia > Middle East > Israel (0.16)

Industry: Energy > Oil & Gas (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Pairwise Clustering and Graphical Models

Shental, Noam, Zomet, Assaf, Hertz, Tomer, Weiss, Yair

Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data.

artificial intelligence, correlation, machine learning, (16 more...)

Country: Asia > Middle East > Israel (0.16)

Industry: Energy > Oil & Gas (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Chudova, Darya, Hart, Christopher, Mjolsness, Eric, Smyth, Padhraic

Gene Expression Clustering with Functional Mixture Models

We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course data. Each functional cluster center is a nonlinear combination of solutions of a simple linear differential equation that describes the change of individual mRNA levels when the synthesis and decay rates are constant. The mixture of continuous time parametric functional forms allows one to (a) account for the heterogeneity in the observed profiles, (b) align the profiles in time by estimating real-valued time shifts, (c) capture the synthesis and decay of mRNA in the course of an experiment, and (d) regularize noisy profiles by enforcing smoothness in the mean curves. We derive an EM algorithm for estimating the parameters of the model, and apply the proposed approach to the set of cycling genes in yeast. The experiments show consistent improvement in predictive power and within cluster variance compared to regular Gaussian mixtures.

alignment, functional form, mean curve, (11 more...)

Country:

North America > United States > California > Orange County > Irvine (0.29)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Monroe County > Key West (0.04)
(2 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Derbeko, Philip, El-Yaniv, Ran, Meir, Ron

Error Bounds for Transductive Learning via Compression and Clustering

This paper is concerned with transductive learning. Although transduction appears to be an easier task than induction, there have not been many provably useful algorithms and bounds for transduction. We present explicit error bounds for transduction and derive a general technique for devising bounds within this setting. The technique is applied to derive error bounds for compression schemes such as (transductive) SVMs and for transduction algorithms based on clustering.

algorithm, theorem 3, transduction, (14 more...)

Country:

North America > United States > New York (0.05)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)