AITopics | Data Mining

Collaborating Authors

Data Mining

Computers have become adept at extracting patterns from very large collections of data. For example, shopping transactions can reveal consumers' preferences and message traffic on social networks can reveal political trends.

News Overviews Instructional Materials AI-Alerts Classics

Ensemble Learning for Free with Evolutionary Algorithms ?

Gagné, Christian, Sebag, Michèle, Schoenauer, Marc, Tomassini, Marco

arXiv.org Artificial IntelligenceApr-30-2007

Evolutionary Learning proceeds by evolving a population of classifiers, from which it generally returns (with some notable exceptions) the single best-of-run classifier as final result. In the meanwhile, Ensemble Learning, one of the most efficient approaches in supervised Machine Learning for the last decade, proceeds by building a population of diverse classifiers. Ensemble Learning with Evolutionary Computation thus receives increasing attention. The Evolutionary Ensemble Learning (EEL) approach presented in this paper features two contributions. First, a new fitness function, inspired by co-evolution and enforcing the classifier diversity, is presented. Further, a new selection criterion based on the classification margin is proposed. This criterion is used to extract the classifier ensemble from the final population only (Off-line) or incrementally along evolution (On-line). Experiments on a set of benchmark problems show that Off-line outperforms single-hypothesis evolutionary learning and state-of-art Boosting and generates smaller classifier ensembles.

artificial intelligence, classifier, health & medicine, (17 more...)

arXiv.org Artificial Intelligence

0704.3905

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.14)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Group and Topic Discovery from Relations and Their Attributes

Wang, Xuerui, Mohanty, Natasha, McCallum, Andrew

Neural Information Processing SystemsDec-31-2006

We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the attributes (here, words) associated with certain relationships. Significantly, joint inference allows the discovery of topics to be guided by the emerging groups, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and thirteen years of similar data from the United Nations. We show that in comparison with traditional, separate latent-variable models for words, or Blockstructures for votes, the Group-Topic model's joint inference discovers more cohesive groups and improved topics.

relation, social media, us government, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.47)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.87)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.48)

Add feedback

Metric Learning by Collapsing Classes

Globerson, Amir, Roweis, Sam T.

Neural Information Processing SystemsDec-31-2006

We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance)for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes infinitely far away. We show that when the metric we learn is used in simple classifiers, ityields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance.

artificial intelligence, optimization problem, projection, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Soft Clustering on Graphs

Yu, Kai, Yu, Shipeng, Tresp, Volker

Neural Information Processing SystemsDec-31-2006

Finally we provide very encouraging experimental results.

artificial intelligence, data mining, graph, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.99)
Information Technology > Data Science > Data Mining (0.95)

Add feedback

Group and Topic Discovery from Relations and Their Attributes

Wang, Xuerui, Mohanty, Natasha, McCallum, Andrew

Neural Information Processing SystemsDec-31-2006

We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the attributes (here, words) associated with certain relationships. Significantly, joint inference allows the discovery of topics to be guided by the emerging groups, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and thirteen years of similar data from the United Nations. We show that in comparison with traditional, separate latent-variable models for words, or Block-structures for votes, the Group-Topic model's joint inference discovers more cohesive groups and improved topics.

relation, social media, us government, (21 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.47)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.87)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.48)

Add feedback

Separation of Music Signals by Harmonic Structure Modeling

Zhang, Yun-gang, Zhang, Chang-shui

Neural Information Processing SystemsDec-31-2006

Separation of music signals is an interesting but difficult problem. It is helpful for many other music researches such as audio content analysis. In this paper, a new music signal separation method is proposed, which is based on harmonic structure modeling. The main idea of harmonic structure modelingis that the harmonic structure of a music signal is stable, so a music signal can be represented by a harmonic structure model. Accordingly, acorresponding separation algorithm is proposed. The main idea is to learn a harmonic structure model for each music signal in the mixture, and then separate signals by using these models to distinguish harmonic structures of different signals. Experimental results show that the algorithm can separate signals and obtain not only a very high Signalto-Noise Ratio(SNR) but also a rather good subjective audio quality.

artificial intelligence, data mining, music signal, (15 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
Europe > Finland (0.14)

Industry:

Media > Music (0.87)
Leisure & Entertainment (0.87)

Technology:

Information Technology > Data Science > Data Mining (0.35)
Information Technology > Artificial Intelligence > Speech (0.31)

Add feedback

The 2005 International Florida Artificial Intelligence Research Society Conference (FLAIRS-05): A Report

Russell, Ingrid, Markov, Zdravko, Holder, Lawrence B., Cook, Diane J.

AI MagazineMar-15-2006

Several special tracks included a significant number of presentations. Zdravko Markov and Larry Holder, was the most extensive, with 18 papers presented of the 35 submitted. The conference continues by Vasile Rus, was the second largest. The last few years have seen a significant reception. This year's conference received version for publication consideration A best paper award was presented to Jeffrey A. Coble, Diane J. Cook, and The program included a general session Lawrence B. Holder of the University with many excellent papers spanning of Texas at Arlington for their paper titled a broad range of AI research areas "Structure Discovery in Sequentially and covering traditional topics such as Connected Data."

artificial intelligence, constraint-based reasoning, university, (18 more...)

AI Magazine

Country: North America > United States > Texas (0.26)

Genre: Personal (0.36)

Industry:

Health & Medicine (0.47)
Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.47)

Add feedback

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

Zhang, Jian, Ghahramani, Zoubin, Yang, Yiming

Neural Information Processing SystemsDec-31-2005

In this paper we propose a probabilistic model for online document clustering. We use nonparametric Dirichlet process prior to model the growing number of clusters, and use a prior of general English language model as the base distribution to handle the generation of novel clusters. Furthermore, cluster uncertainty is modeled with a Bayesian Dirichletmultinomial distribution. We use empirical Bayes method to estimate hyperparameters based on a historical dataset. Our probabilistic model is applied to the novelty detection task in Topic Detection and Tracking (TDT) and compared with existing approaches in the literature.

artificial intelligence, bayesian inference, probability, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Nearly Tight Bounds for the Continuum-Armed Bandit Problem

Kleinberg, Robert D.

Neural Information Processing SystemsDec-31-2005

In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set.

algorithm, artificial intelligence, big data, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Detecting Significant Multidimensional Spatial Clusters

Neill, Daniel B., Moore, Andrew W., Pereira, Francisco, Mitchell, Tom M.

Neural Information Processing SystemsDec-31-2005

Each of these problems can be solved using a spatial scan statistic (Kulldorff, 1997), where we compute the maximum of a likelihood ratio statistic over all spatial regions, and find the significance of this region by randomization. However, computing the scan statistic for all spatial regions is generally computationally infeasible, so we introduce a novel fast spatial scan algorithm, generalizing the 2D scan algorithm of (Neill and Moore, 2004) to arbitrary dimensions. Our new multidimensional multiresolution algorithm allows us to find spatial clusters up to 1400x faster than the naive spatial scan, without any loss of accuracy.

health & medicine, neurology, scan statistic, (22 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > Experimental Study (0.48)

Industry: