Plotting

 Country


The structure of verbal sequences analyzed with unsupervised learning techniques

arXiv.org Artificial Intelligence

Data mining allows the exploration of sequences of phenomena, whereas one usually tends to focus on isolated phenomena or on the relation between two phenomena. It offers invaluable tools for theoretical analyses and exploration of the structure of sentences, texts, dialogues, and speech. We report here the results of an attempt at using it for inspecting sequences of verbs from French accounts of road accidents. This analysis comes from an original approach of unsupervised training allowing the discovery of the structure of sequential data. The entries of the analyzer were only made of the verbs appearing in the sentences. It provided a classification of the links between two successive verbs into four distinct clusters, allowing thus text segmentation. We give here an interpretation of these clusters by applying a statistical analysis to independent semantic annotations.


A Heuristic Routing Mechanism Using a New Addressing Scheme

arXiv.org Artificial Intelligence

Current methods of routing are based on network information in the form of routing tables, in which routing protocols determine how to update the tables according to the network changes. Despite the variability of data in routing tables, node addresses are constant. In this paper, we first introduce the new concept of variable addresses, which results in a novel framework to cope with routing problems using heuristic solutions. Then we propose a heuristic routing mechanism based on the application of genes for determination of network addresses in a variable address network and describe how this method flexibly solves different problems and induces new ideas in providing integral solutions for variety of problems. The case of ad-hoc networks is where simulation results are more supportive and original solutions have been proposed for issues like mobility.


What's in a Name?

arXiv.org Artificial Intelligence

This paper describes experiments on identifying the language of a single name in isolation or in a document written in a different language. A new corpus has been compiled and made available, matching names against languages. This corpus is used in a series of experiments measuring the performance of general language models and names-only language models on the language identification task. Conclusions are drawn from the comparison between using general language models and names-only language models and between identifying the language of isolated names and the language of very short document fragments. Future research directions are outlined.


Semantic distillation: a method for clustering objects by their contextual specificity

arXiv.org Machine Learning

Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.


Local approximate inference algorithms

arXiv.org Artificial Intelligence

We present a new local approximation algorithm for computing Maximum a Posteriori (MAP) and log-partition function for arbitrary exponential family distribution represented by a finite-valued pair-wise Markov random field (MRF), say $G$. Our algorithm is based on decomposition of $G$ into {\em appropriately} chosen small components; then computing estimates locally in each of these components and then producing a {\em good} global solution. We show that if the underlying graph $G$ either excludes some finite-sized graph as its minor (e.g. Planar graph) or has low doubling dimension (e.g. any graph with {\em geometry}), then our algorithm will produce solution for both questions within {\em arbitrary accuracy}. We present a message-passing implementation of our algorithm for MAP computation using self-avoiding walk of graph. In order to evaluate the computational cost of this implementation, we derive novel tight bounds on the size of self-avoiding walk tree for arbitrary graph. As a consequence of our algorithmic result, we show that the normalized log-partition function (also known as free-energy) for a class of {\em regular} MRFs will converge to a limit, that is computable to an arbitrary accuracy.


Solving Constraint Satisfaction Problems through Belief Propagation-guided decimation

arXiv.org Artificial Intelligence

Message passing algorithms have proved surprisingly successful in solving hard constraint satisfaction problems on sparse random graphs. In such applications, variables are fixed sequentially to satisfy the constraints. Message passing is run after each step. Its outcome provides an heuristic to make choices at next step. This approach has been referred to as `decimation,' with reference to analogous procedures in statistical physics. The behavior of decimation procedures is poorly understood. Here we consider a simple randomized decimation algorithm based on belief propagation (BP), and analyze its behavior on random k-satisfiability formulae. In particular, we propose a tree model for its analysis and we conjecture that it provides asymptotically exact predictions in the limit of large instances. This conjecture is confirmed by numerical simulations.


From Texts to Structured Documents: The Case of Health Practice Guidelines

arXiv.org Artificial Intelligence

This paper describes a system capable of semi-automatically filling an XML template from free texts in the clinical domain (practice guidelines). The XML template includes semantic information not explicitly encoded in the text (pairs of conditions and actions/recommendations). Therefore, there is a need to compute the exact scope of conditions over text sequences expressing the required actions. We present in this paper the rules developed for this task. We show that the system yields good performance when applied to the analysis of French practice guidelines.


Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

arXiv.org Machine Learning

Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete combinatorial optimization where the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated annealing which allows one to guarantee finite-time performance in the optimization of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.


Autoencoder, Principal Component Analysis and Support Vector Regression for Data Imputation

arXiv.org Artificial Intelligence

Data collection often results in records that have missing values or variables. This investigation compares 3 different data imputation models and identifies their merits by using accuracy measures. Autoencoder Neural Networks, Principal components and Support Vector regression are used for prediction and combined with a genetic algorithm to then impute missing variables. The use of PCA improves the overall performance of the autoencoder network while the use of support vector regression shows promising potential for future investigation. Accuracies of up to 97.4 % on imputation of some of the variables were achieved.


Introduction to the Special Issue on Innovative Applications of Artificial Intelligence

AI Magazine

We are very pleased to republish here extended versions of a sample of the papers drawn from the Innovative Applications of Artificial Intelligence Conference (IAAI-06), which was held July 17-20, 2006, in Boston, Massachusetts. Three of these articles describe deployed applications and two describe emerging applications.