Goto

Collaborating Authors

 Data Mining


Detecting Significant Multidimensional Spatial Clusters

Neural Information Processing Systems

Each of these problems can be solved using a spatial scan statistic (Kulldorff, 1997), where we compute the maximum of a likelihood ratio statistic over all spatial regions, and find the significance of this region by randomization. However, computing the scan statistic for all spatial regions is generally computationally infeasible, so we introduce a novel fast spatial scan algorithm, generalizing the 2D scan algorithm of (Neill and Moore, 2004) to arbitrary dimensions. Our new multidimensional multiresolution algorithm allows us to find spatial clusters up to 1400x faster than the naive spatial scan, without any loss of accuracy.


The Workshops at the Twentieth National Conference on Artificial Intelligence

AI Magazine

The AAAI-05 workshops were held on Saturday and Sunday, July 9-10, in Pittsburgh, Pennsylvania. The thirteen workshops were Contexts and Ontologies: Theory, Practice and Applications, Educational Data Mining, Exploring Planning and Scheduling for Web Services, Grid and Autonomic Computing, Human Comprehensible Machine Learning, Inference for Textual Question Answering, Integrating Planning into Scheduling, Learning in Computer Vision, Link Analysis, Mobile Robot Workshop, Modular Construction of Humanlike Intelligence, Multiagent Learning, Question Answering in Restricted Domains, and Spoken Language Understanding.


An Opinionated History of AAAI

AI Magazine

AAAI has seen great ups and downs, based largely on the perceived success of AI in business applications. Great early success allowed AAAI to weather the "AI winter" to enjoy the current "thaw." Other challenges to AAAI have resulted from its success in spinning out international conferences, thereby effectively removing several key AI areas from the AAAI National Conference. AAAI leadership continues to look for ways to deal with these challenges. AAI began life intending to be completely societies (such as ACM).


The Workshops at the Twentieth National Conference on Artificial Intelligence

AI Magazine

The AAAI-05 workshops were held on Saturday and Sunday, July 9-10, in Pittsburgh, Pennsylvania. The thirteen workshops were Contexts and Ontologies: Theory, Practice and Applications, Educational Data Mining, Exploring Planning and Scheduling for Web Services, Grid and Autonomic Computing, Human Comprehensible Machine Learning, Inference for Textual Question Answering, Integrating Planning into Scheduling, Learning in Computer Vision, Link Analysis, Mobile Robot Workshop, Modular Construction of Humanlike Intelligence, Multiagent Learning, Question Answering in Restricted Domains, and Spoken Language Understanding.


Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis

Journal of Artificial Intelligence Research

We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris' distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.


Semantic Integration Research in the Database Community: A Brief Survey

AI Magazine

Semantic integration has been a long-standing challenge for the database community. It has received steady attention over the past two decades, and has now become a prominent area of database research. In this article, we first review database applications that require semantic integration and discuss the difficulties underlying the integration process. We then describe recent progress and identify open research issues. We focus in particular on schema matching, a topic that has received much attention in the database community, but also discuss data matching (for example, tuple deduplication) and open issues beyond the match discovery context (for example, reasoning with matches, match verification and repair, and reconciling inconsistent data values). For previous surveys of database research on semantic integration, see Rahm and Bernstein (2001); Ouksel and Seth (1999); and Batini, Lenzerini, and Navathe (1986).


Semantic Integration in Text: From Ambiguous Names to Identifiable Entities

AI Magazine

Semantic integration focuses on discovering, representing, and manipulating correspondences between entities in disparate data sources. The topic has been widely studied in the context of structured data, with problems being considered including ontology and schema matching, matching relational tuples, and reconciling inconsistent data values. In recent years, however, semantic integration over text has also received increasing attention. This article studies a key challenge in semantic integration over text: identifying whether different mentions of real-world entities, such as "JFK" and "John Kennedy," within and across natural language text documents, actually represent the same concept. We present a machine-learning study of this problem. The first approach is a discriminative approach -- a pairwise local classifier is trained in a supervised way to determine whether two given mentions represent the same real-world entity. This is followed, potentially, by a global clustering algorithm that uses the classifier as its similarity metric. Our second approach is a global generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes (1) a joint distribution over entities (for example, a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), and (2) an "author" model that assumes that at least one mention of an entity in a document is easily identifiable and then generates other mentions via (3) an "appearance" model that governs how mentions are transformed from the "representative" mention. We show that both approaches perform very accurately, in the range of 90-95 percent. F1 measure for different entity types, much better than previous approaches to some aspects of this problem. Finally, we discuss how our solution for mention matching in text can be potentially applied to matching relational tuples, as well as to linking entities across databases and text.


Parameterized Novelty Detectors for Environmental Sensor Monitoring

Neural Information Processing Systems

As part of an environmental observation and forecasting system, sensors deployed in the Columbia RIver Estuary (CORIE) gather information on physical dynamics and changes in estuary habitat. Of these, salinity sensors are particularly susceptible to biofouling, which gradually degrades sensor response and corrupts critical data. Automatic fault detectors have the capability to identify bio-fouling early and minimize data loss. Complicating the development of discriminatory classifiers is the scarcity of bio-fouling onset examples and the variability of the bio-fouling signature. To solve these problems, we take a novelty detection approach that incorporates a parameterized bio-fouling model. These detectors identify the occurrence of bio-fouling, and its onset time as reliably as human experts. Real-time detectors installed during the summer of 2001 produced no false alarms, yet detected all episodes of sensor degradation before the field staff scheduled these sensors for cleaning. From this initial deployment through February 2003, our bio-fouling detectors have essentially doubled the amount of useful data coming from the CORIE sensors.


How to Combine Expert (and Novice) Advice when Actions Impact the Environment?

Neural Information Processing Systems

The so-called "experts algorithms" constitute a methodology for choosing actions repeatedly, when the rewards depend both on the choice of action and on the unknown current state of the environment. An experts algorithm has access to a set of strategies ("experts"), each of which may recommend which action to choose. The algorithm learns how to combine the recommendations of individual experts so that, in the long run, for any fixed sequence of states of the environment, it does as well as the best expert would have done relative to the same sequence. This methodology may not be suitable for situations where the evolution of states of the environment depends on past chosen actions, as is usually the case, for example, in a repeated nonzero-sum game. A new experts algorithm is presented and analyzed in the context of repeated games. It is shown that asymptotically, under certain conditions, it performs as well as the best available expert. This algorithm is quite different from previously proposed experts algorithms. It represents a shift from the paradigms of regret minimization and myopic optimization to consideration of the long-term effect of a player's actions on the opponent's actions or the environment. The importance of this shift is demonstrated by the fact that this algorithm is capable of inducing cooperation in the repeated Prisoner's Dilemma game, whereas previous experts algorithms converge to the suboptimal non-cooperative play.


How to Combine Expert (and Novice) Advice when Actions Impact the Environment?

Neural Information Processing Systems

The so-called "experts algorithms" constitute a methodology for choosing actionsrepeatedly, when the rewards depend both on the choice of action and on the unknown current state of the environment. An experts algorithm has access to a set of strategies ("experts"), each of which may recommend which action to choose. The algorithm learns how to combine therecommendations of individual experts so that, in the long run, for any fixed sequence of states of the environment, it does as well as the best expert would have done relative to the same sequence. This methodology maynot be suitable for situations where the evolution of states of the environment depends on past chosen actions, as is usually the case, for example, in a repeated nonzero-sum game. A new experts algorithm is presented and analyzed in the context of repeated games.It is shown that asymptotically, under certain conditions, it performs as well as the best available expert. This algorithm is quite different from previously proposed experts algorithms. It represents a shift from the paradigms of regret minimization and myopic optimization toconsideration of the long-term effect of a player's actions on the opponent's actions or the environment. The importance of this shift is demonstrated by the fact that this algorithm is capable of inducing cooperation inthe repeated Prisoner's Dilemma game, whereas previous experts algorithms converge to the suboptimal non-cooperative play.