AITopics

1104.158

Country:

North America > United States > California (0.28)
North America > United States > New York > New York County > New York City (0.15)
North America > Mexico (0.14)
(8 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Space Agency (0.69)
Health & Medicine (0.67)
Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
(3 more...)

Knowles, David A., Ghahramani, Zoubin

Pitman-Yor Diffusion Trees

arXiv.org Machine LearningJun-16-2011

We introduce the Pitman Yor Diffusion Tree (PYDT) for hierarchical clustering, a generalization of the Dirichlet Diffusion Tree (Neal, 2001) which removes the restriction to binary branching structure. The generative process is described and shown to result in an exchangeable distribution over data points. We prove some theoretical properties of the model and then present two inference methods: a collapsed MCMC sampler which allows us to model uncertainty over tree structures, and a computationally efficient greedy Bayesian EM search algorithm. Both algorithms use message passing on the tree structure. The utility of the model and algorithms is demonstrated on synthetic and real world data, both continuous and binary.

artificial intelligence, machine learning, natural language, (17 more...)

1106.2494

Country: North America (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Contreras, Pedro, Murtagh, Fionn

Fast, Linear Time Hierarchical Clustering using the Baire Metric

arXiv.org Machine LearningJun-11-2011

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.

artificial intelligence, digit, machine learning, (18 more...)

doi: 10.1007/s00357-012-9106-3

1106.2229

Country:

Europe (1.00)
North America > United States > Illinois (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Dong, Xiaowen, Frossard, Pascal, Vandergheynst, Pierre, Nefedov, Nikolai

Clustering with Multi-Layer Graphs: A Spectral Perspective

arXiv.org Machine LearningJun-11-2011

Observational data usually comes with a multimodal nature, which means that it can be naturally represented by a multi-layer graph whose layers share the same set of vertices (users) with different edges (pairwise relationships). In this paper, we address the problem of combining different layers of the multi-layer graph for improved clustering of the vertices compared to using layers independently. We propose two novel methods, which are based on joint matrix factorization and graph regularization framework respectively, to efficiently combine the spectrum of the multiple graph layers, namely the eigenvectors of the graph Laplacian matrices. In each case, the resulting combination, which we call a "joint spectrum" of multiple graphs, is used for clustering the vertices. We evaluate our approaches by simulations with several real world social network datasets. Results demonstrate the superior or competitive performance of the proposed methods over state-of-the-art technique and common baseline methods, such as co-regularization and summation of information from individual graphs.

artificial intelligence, machine learning, natural language, (19 more...)

doi: 10.1109/TSP.2012.2212886

1106.2233

Country: Europe > Switzerland (0.15)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.48)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Communications (0.90)

arXiv.org Machine LearningJun-6-2011

Reconstruction of Epsilon-Machines in Predictive Frameworks and Decisional States

Brodu, Nicolas

This article introduces both a new algorithm for reconstructing epsilon-machines from data, as well as the decisional states. These are defined as the internal states of a system that lead to the same decision, based on a user-provided utility or pay-off function. The utility function encodes some a priori knowledge external to the system, it quantifies how bad it is to make mistakes. The intrinsic underlying structure of the system is modeled by an epsilon-machine and its causal states. The decisional states form a partition of the lower-level causal states that is defined according to the higher-level user's knowledge. In a complex systems perspective, the decisional states are thus the "emerging" patterns corresponding to the utility function. The transitions between these decisional states correspond to events that lead to a change of decision. The new REMAPF algorithm estimates both the epsilon-machine and the decisional states from data. Application examples are given for hidden model reconstruction, cellular automata filtering, and edge detection in images.

data mining, decision support system, machine learning, (20 more...)

0902.0600

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
(2 more...)

When Optimal Is Just Not Good Enough: Learning Fast Informative Action Cost Partitionings

Karpas, Erez (Technion) | Katz, Michael (Technion) | Markovitch, Shaul (Technion)

Several recent heuristics for domain independent planning adopt some action cost partitioning scheme to derive admissible heuristic estimates. Given a state, two methods for obtaining an action cost partitioning have been proposed: optimal cost partitioning, which results in the best possible heuristic estimate for that state, but requires a substantial computational effort, and ad-hoc (uniform) cost partitioning, which is much faster, but is usually less informative. These two methods represent almost opposite points in the tradeoff between heuristic accuracy and heuristic computation time. One compromise that has been proposed between these two is using an optimal cost partitioning for the initial state to evaluate all states. In this paper, we propose a novel method for deriving a fast, informative cost-partitioning scheme, that is based on computing optimal action cost partitionings for a small set of states, and using these to derive heuristic estimates for all states. Our method provides greater control over the accuracy/computation-time tradeoff, which, as our empirical evaluation shows, can result in better performance.

optimal cost, planning task, state space, (15 more...)

Twenty-First International Conference on Automated Planning and Scheduling

Country:

Asia > Middle East > Israel (0.04)
Oceania > Australia > Queensland > Townsville (0.04)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Data Science (0.68)

Raj, Prabakararaj Swapna (Indian Institute of Technology Madras) | Ravindran, Balaraman (Indian Institute of Technology Madras)

Utility Driven Clustering

Data mining has primarily focused on statistical properties of data alone and not necessarily on what could be done with the patterns. While there has been some work on measuring usefulness of patterns in decision making but not on using such measures for driving the mining process. We introduce a framework to mine clusters that support decision making. We use an extrinsic measure that evaluates patterns based on their utility in decision making. We show empirical validationof our approach on several test domains.

dataset, evaluation, utility function, (13 more...)

Twenty-Fourth International FLAIRS Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.32)

Learning Parameters of the K-Means Algorithm From Subjective Human Annotation

The New York Public Library is participating in the Chronicling America initiative to develop an online searchable database of historically significant newspaper articles. Microfilm copies of the papers are scanned and high resolution OCR software is run on them. The text from the OCR provides a wealth of data and opinion for researchers and historians. However, the categorization of articles provided by the OCR engine is rudimentary and a large number of the articles are labeled ``editorial" without further categorization. To provide a more refined grouping of articles, unsupervised machine learning algorithms (such as K-Means) are being investigated. The K-Means algorithm requires tuning of parameters such as the number of clusters and mechanism of seeding to ensure that the search is not prone to being caught in a local minima. We designed a pilot study to observe whether humans are adept at finding sub-categories. The subjective labels provided by humans are used as a guide to compare performance of the automated clustering techniques. In addition, seeds provided by annotators are carefully incorporated into a semi-supervised K-Means algorithm (Seeded K-Means); empirical results indicate that this helps to improve performance and provides an intuitive sub-categorization of the articles labeled ``editorial" by the OCR engine.

algorithm, annotator, category, (15 more...)

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(2 more...)

Industry:

Media > News (0.40)
Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Consensus Clustering + Meta Clustering = Multiple Consensus Clustering

Zhang, Yi (Florida International University) | Li, Tao (Florida International University)

Consensus clustering and meta clustering are two important extensions of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings, and meta clustering aims to group similar input clusterings together so that users only need to examine a small number of different clusterings. In this paper, we present a new approach, MCC (stands for multiple consensus clustering), to explore multiple clustering views of a given dataset from the input clusterings by combining consensus clustering and meta clustering. In particular, given a set of input clusterings of a particular data set, MCC employs meta clustering to cluster the input clusterings and then uses consensus clustering to generate a consensus for each cluster of the input clusterings. Extensive experimental results on 11 real world data sets demonstrate the effectiveness of our proposed method.

consensus, dataset, similarity, (15 more...)

Twenty-Fourth International FLAIRS Conference

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Ghosh, Tamal, Modak, Mousumi, Dan, Pranab K

SAPFOCS: a metaheuristic based approach to part family formation problems in group technology

arXiv.org Artificial IntelligenceMay-11-2011

This article deals with Part family formation problem which is believed to be moderately complicated to be solved in polynomial time in the vicinity of Group Technology (GT). In the past literature researchers investigated that the part family formation techniques are principally based on production flow analysis (PFA) which usually considers operational requirements, sequences and time. Part Coding Analysis (PCA) is merely considered in GT which is believed to be the proficient method to identify the part families. PCA classifies parts by allotting them to different families based on their resemblances in: (1) design characteristics such as shape and size, and/or (2) manufacturing characteristics (machining requirements). A novel approach based on simulated annealing namely SAPFOCS is adopted in this study to develop effective part families exploiting the PCA technique. Thereafter Taguchi's orthogonal design method is employed to solve the critical issues on the subject of parameters selection for the proposed metaheuristic algorithm. The adopted technique is therefore tested on 5 different datasets of size 5 {\times} 9 to 27 {\times} 9 and the obtained results are compared with C-Linkage clustering technique. The experimental results reported that the proposed metaheuristic algorithm is extremely effective in terms of the quality of the solution obtained and has outperformed C-Linkage algorithm in most instances.

evolutionary algorithm, machine learning, part family, (17 more...)

arXiv.org Artificial Intelligence

1012.5815

Country: Asia > India (0.28)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.94)