Goto

Collaborating Authors

 Industry


Modeling Epidemic Spread in Synthetic Populations - Virtual Plagues in Massively Multiplayer Online Games

arXiv.org Artificial Intelligence

Some games, such as The Sims, encourage mods (including virtual plagues) and the game has a huge community of modders. In one famous case, the guinea pig mod [15]; one of the first recorded virtual plagues, the culprit turned out to be the chief game developer Mr. Will Wright himself [10]. This opened for a discussion about the responsibility from the game developer side as to acknowledging the time and effort put in by people into their game characters, including not only virtual plagues but also, e.g., keeping servers alive in spite of declining numbers of players. The very first mods were made by Wizards (and players with access to the underlying text databases) of Multi-User Dungeon (MUD) in the late 1980s [11]. The most famous example of virtual plague initiation from the developer side is perhaps Blizzard's introduction of the Corrupted Blood debuff in World of Warcraft. This debuff was originally conceived of as spreading from monster to player only, although Blizzard let it transmit from player to player if the players were standing close to each other. The idea was to let one boss monster issue the debuff, which would then be allowed to affect all players fighting that particular monster (and ultimately help kill off another boss monster in the same area). What was unexpected, and seemingly a surprise to Blizzard, was that players passed the debuff on to their pets, dismissed the pets, and then resummoned them later, in much more populated areas. This led to tens of thousands of deaths on at least three servers, and caused Blizzard to try to correct the problem through various quick fixes (cf.


Modeling Computations in a Semantic Network

arXiv.org Artificial Intelligence

Semantic network research has seen a resurgence from its early history in the cognitive sciences with the inception of the Semantic Web initiative. The Semantic Web effort has brought forth an array of technologies that support the encoding, storage, and querying of the semantic network data structure at the world stage. Currently, the popular conception of the Semantic Web is that of a data modeling medium where real and conceptual entities are related in semantically meaningful ways. However, new models have emerged that explicitly encode procedural information within the semantic network substrate. With these new technologies, the Semantic Web has evolved from a data modeling medium to a computational medium. This article provides a classification of existing computational modeling efforts and the requirements of supporting technologies that will aid in the further growth of this burgeoning domain.


Mixed membership stochastic blockmodels

arXiv.org Machine Learning

Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.


The Google Similarity Distance

arXiv.org Artificial Intelligence

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.


Truecluster matching

arXiv.org Artificial Intelligence

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a chi-square transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.


Using Genetic Algorithms to Optimise Rough Set Partition Sizes for HIV Data Analysis

arXiv.org Artificial Intelligence

In this paper, we present a method to optimise rough set partition sizes, to which rule extraction is performed on HIV data. The genetic algorithm optimisation technique is used to determine the partition sizes of a rough set in order to maximise the rough sets prediction accuracy. The proposed method is tested on a set of demographic properties of individuals obtained from the South African antenatal survey. Six demographic variables were used in the analysis, these variables are; race, age of mother, education, gravidity, parity, and age of father, with the outcome or decision being either HIV positive or negative. Rough set theory is chosen based on the fact that it is easy to interpret the extracted rules. The prediction accuracy of equal width bin partitioning is 57.7% while the accuracy achieved after optimising the partitions is 72.8%. Several other methods have been used to analyse the HIV data and their results are stated and compared to that of rough set theory (RST).


The Parameter-Less Self-Organizing Map algorithm

arXiv.org Artificial Intelligence

The Parameter-Less Self-Organizing Map (PLSOM) is a new neural network algorithm based on the Self-Organizing Map (SOM). It eliminates the need for a learning rate and annealing schemes for learning rate and neighbourhood size. We discuss the relative performance of the PLSOM and the SOM and demonstrate some tasks in which the SOM fails but the PLSOM performs satisfactory. Finally we discuss some example applications of the PLSOM and present a proof of ordering under certain limited conditions.


Support vector machine for functional data classification

arXiv.org Machine Learning

In many applications, input data are sampled functions taking their values in infinite dimensional spaces rather than standard vectors. This fact has complex consequences on data analysis algorithms that motivate modifications of them. In fact most of the traditional data analysis tools for regression, classification and clustering have been adapted to functional inputs under the general name of functional Data Analysis (FDA). In this paper, we investigate the use of Support Vector Machines (SVMs) for functional data analysis and we focus on the problem of curves discrimination. SVMs are large margin classifier tools based on implicit non linear mappings of the considered data into high dimensional spaces thanks to kernels. We show how to define simple kernels that take into account the unctional nature of the data and lead to consistent classification. Experiments conducted on real world data emphasize the benefit of taking into account some functional aspects of the problems.


Ensemble Learning for Free with Evolutionary Algorithms ?

arXiv.org Artificial Intelligence

Evolutionary Learning proceeds by evolving a population of classifiers, from which it generally returns (with some notable exceptions) the single best-of-run classifier as final result. In the meanwhile, Ensemble Learning, one of the most efficient approaches in supervised Machine Learning for the last decade, proceeds by building a population of diverse classifiers. Ensemble Learning with Evolutionary Computation thus receives increasing attention. The Evolutionary Ensemble Learning (EEL) approach presented in this paper features two contributions. First, a new fitness function, inspired by co-evolution and enforcing the classifier diversity, is presented. Further, a new selection criterion based on the classification margin is proposed. This criterion is used to extract the classifier ensemble from the final population only (Off-line) or incrementally along evolution (On-line). Experiments on a set of benchmark problems show that Off-line outperforms single-hypothesis evolutionary learning and state-of-art Boosting and generates smaller classifier ensembles.


Abstract Reasoning for Planning and Coordination

Journal of Artificial Intelligence Research

The judicious use of abstraction can help planning agents to identify key interactions between actions, and resolve them, without getting bogged down in details. However, ignoring the wrong details can lead agents into building plans that do not work, or into costly backtracking and replanning once overlooked interdependencies come to light. We claim that associating systematically-generated summary information with plans' abstract operators can ensure plan correctness, even for asynchronously-executed plans that must be coordinated across multiple agents, while still achieving valuable efficiency gains. In this paper, we formally characterize hierarchical plans whose actions have temporal extent, and describe a principled method for deriving summarized state and metric resource information for such actions. We provide sound and complete algorithms, along with heuristics, to exploit summary information during hierarchical refinement planning and plan coordination. Our analyses and experiments show that, under clearcut and reasonable conditions, using summary information can speed planning as much as doubly exponentially even for plans involving interacting subproblems.