Goto

Collaborating Authors

 arXiv.org Artificial Intelligence


Automatically Restructuring Practice Guidelines using the GEM DTD

arXiv.org Artificial Intelligence

This paper describes a system capable of semi-automatically filling an XML template from free texts in the clinical domain (practice guidelines). The XML template includes semantic information not explicitly encoded in the text (pairs of conditions and actions/recommendations). Therefore, there is a need to compute the exact scope of conditions over text sequences expressing the required actions. We present a system developed for this task. We show that it yields good performance when applied to the analysis of French practice guidelines.


A Novel Model of Working Set Selection for SMO Decomposition Methods

arXiv.org Artificial Intelligence

In the process of training Support Vector Machines (SVMs) by decomposition methods, working set selection is an important technique, and some exciting schemes were employed into this field. To improve working set selection, we propose a new model for working set selection in sequential minimal optimization (SMO) decomposition methods. In this model, it selects B as working set without reselection. Some properties are given by simple proof, and experiments demonstrate that the proposed method is in general faster than existing methods.


Virtual Sensor Based Fault Detection and Classification on a Plasma Etch Reactor

arXiv.org Artificial Intelligence

The SEMATECH sponsored J-88-E project teaming Texas Instruments with NeuroDyne (et al.) focused on Fault Detection and Classification (FDC) on a Lam 9600 aluminum plasma etch reactor, used in the process of semiconductor fabrication. Fault classification was accomplished by implementing a series of virtual sensor models which used data from real sensors (Lam Station sensors, Optical Emission Spectroscopy, and RF Monitoring) to predict recipe setpoints and wafer state characteristics. Fault detection and classification were performed by comparing predicted recipe and wafer state values with expected values. Models utilized include linear PLS, Polynomial PLS, and Neural Network PLS. Prediction of recipe setpoints based upon sensor data provides a capability for cross-checking that the machine is maintaining the desired setpoints. Wafer state characteristics such as Line Width Reduction and Remaining Oxide were estimated on-line using these same process sensors (Lam, OES, RFM). Wafer-to-wafer measurement of these characteristics in a production setting (where typically this information may be only sparsely available, if at all, after batch processing runs with numerous wafers have been completed) would provide important information to the operator that the process is or is not producing wafers within acceptable bounds of product quality. Production yield is increased, and correspondingly per unit cost is reduced, by providing the operator with the opportunity to adjust the process or machine before etching more wafers.


Modeling Computations in a Semantic Network

arXiv.org Artificial Intelligence

Semantic network research has seen a resurgence from its early history in the cognitive sciences with the inception of the Semantic Web initiative. The Semantic Web effort has brought forth an array of technologies that support the encoding, storage, and querying of the semantic network data structure at the world stage. Currently, the popular conception of the Semantic Web is that of a data modeling medium where real and conceptual entities are related in semantically meaningful ways. However, new models have emerged that explicitly encode procedural information within the semantic network substrate. With these new technologies, the Semantic Web has evolved from a data modeling medium to a computational medium. This article provides a classification of existing computational modeling efforts and the requirements of supporting technologies that will aid in the further growth of this burgeoning domain.


Loop corrections for message passing algorithms in continuous variable models

arXiv.org Artificial Intelligence

In this paper we derive the equations for Loop Corrected Belief Propagation on a continuous variable Gaussian model. Using the exactness of the averages for belief propagation for Gaussian models, a different way of obtaining the covariances is found, based on Belief Propagation on cavity graphs. We discuss the relation of this loop correction algorithm to Expectation Propagation algorithms for the case in which the model is no longer Gaussian, but slightly perturbed by nonlinear terms.


Modeling Epidemic Spread in Synthetic Populations - Virtual Plagues in Massively Multiplayer Online Games

arXiv.org Artificial Intelligence

A virtual plague is a process in which a behavior-affecting property spreads among characters in a Massively Multiplayer Online Game (MMOG). The MMOG individuals constitute a synthetic population, and the game can be seen as a form of interactive executable model for studying disease spread, albeit of a very special kind. To a game developer maintaining an MMOG, recognizing, monitoring, and ultimately controlling a virtual plague is important, regardless of how it was initiated. The prospect of using tools, methods and theory from the field of epidemiology to do this seems natural and appealing. We will address the feasibility of such a prospect, first by considering some basic measures used in epidemiology, then by pointing out the differences between real world epidemics and virtual plagues. We also suggest directions for MMOG developer control through epidemiological modeling. Our aim is understanding the properties of virtual plagues, rather than trying to eliminate them or mitigate their effects, as would be in the case of real infectious disease.


The Google Similarity Distance

arXiv.org Artificial Intelligence

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.


Truecluster matching

arXiv.org Artificial Intelligence

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a chi-square transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.


Consensus Propagation

arXiv.org Artificial Intelligence

We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge.


Truecluster: robust scalable clustering with model selection

arXiv.org Artificial Intelligence

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection, robustness, and scalability to large datasets. We consider the important problem of deciding on the optimal number of clusters, given an arbitrary definition of space and clusteriness. We show how to construct a cluster information criterion that allows objective model selection. Differing from other approaches, our truecluster method does not require specific assumptions about underlying distributions, dissimilarity definitions or cluster models. Truecluster puts arbitrary clustering algorithms into a generic unified (sampling-based) statistical framework. It is scalable to big datasets and provides robust cluster assignments and case-wise diagnostics. Truecluster will make clustering more objective, allows for automation, and will save time and costs. Free R software is available.