Goto

Collaborating Authors

 Technology


The Google Similarity Distance

arXiv.org Artificial Intelligence

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.


Truecluster matching

arXiv.org Artificial Intelligence

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a chi-square transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.


Consensus Propagation

arXiv.org Artificial Intelligence

We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge.


Truecluster: robust scalable clustering with model selection

arXiv.org Artificial Intelligence

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection, robustness, and scalability to large datasets. We consider the important problem of deciding on the optimal number of clusters, given an arbitrary definition of space and clusteriness. We show how to construct a cluster information criterion that allows objective model selection. Differing from other approaches, our truecluster method does not require specific assumptions about underlying distributions, dissimilarity definitions or cluster models. Truecluster puts arbitrary clustering algorithms into a generic unified (sampling-based) statistical framework. It is scalable to big datasets and provides robust cluster assignments and case-wise diagnostics. Truecluster will make clustering more objective, allows for automation, and will save time and costs. Free R software is available.


Undercomplete Blind Subspace Deconvolution

arXiv.org Machine Learning

We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The associated `high dimensional' ISA problem can be handled by a recent technique called joint f-decorrelation (JFD). Similar decorrelation methods have been used previously for kernel independent component analysis (kernel-ICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decorrelation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be tackled, and (ii) explore the working and advantages of the derived kernel-ISA methods.


Using Genetic Algorithms to Optimise Rough Set Partition Sizes for HIV Data Analysis

arXiv.org Artificial Intelligence

In this paper, we present a method to optimise rough set partition sizes, to which rule extraction is performed on HIV data. The genetic algorithm optimisation technique is used to determine the partition sizes of a rough set in order to maximise the rough sets prediction accuracy. The proposed method is tested on a set of demographic properties of individuals obtained from the South African antenatal survey. Six demographic variables were used in the analysis, these variables are; race, age of mother, education, gravidity, parity, and age of father, with the outcome or decision being either HIV positive or negative. Rough set theory is chosen based on the fact that it is easy to interpret the extracted rules. The prediction accuracy of equal width bin partitioning is 57.7% while the accuracy achieved after optimising the partitions is 72.8%. Several other methods have been used to analyse the HIV data and their results are stated and compared to that of rough set theory (RST).


Lasso type classifiers with a reject option

arXiv.org Machine Learning

A discriminant function f: X R yields a classifier sgn(f(x)) { 1, 1} that represents our guess of the label Y of a future observation X and we err if the margin y · f(x) 0. Since observations x for which the conditional probability η(x) P{Y 1 X x} (1) is close to 1/2 are difficult to classify, we introduce a reject option for classifiers, by allowing for a third decision, R (reject), expressing doubt. We built in the reject option by using a threshold value 0 τ 1 as follows. Given a discriminant function f: X R, we report sgn(f(x)) { 1,1} if f(x) τ, but we withhold decision if f(x) τ and report R. We assume that the cost of making a wrong decision is 1 and the cost of utilizing the reject option is d 0. The appropriate risk function is then E[l(Yf(X))] P{Yf(X) τ} dP{ Yf(X) τ} (2) Research is supported in part by NSF grant DMS 0706829 155 M. Wegkamp/Lasso type classifiers with a reject option 156


Non-Computability of Consciousness

arXiv.org Artificial Intelligence

Research in the field of artificial intelligence, which attempts to imitate and simulate intelligent activities using a machine, has blossomed along with the development of information technology [1]. Because the study of artificial intelligence has provided many insights into intelligent behaviors such as pattern recognition, decision theory, etc., there is a question whether consciousness or self-awareness could emerge out of a computational system, a view termed as strong artificial intelligence. This question can be rephrased and stated as follows: Are all conscious activities computational processes?. In this paper, the answer to this question is shown to be no. In order to examine the computability of a physical phenomenon, the phenomenon should first be represented as a computational model; subsequently, the computability of this particular model can be examined. The physical phenomenon can then be claimed to be computable or not based on this examination. A similar approach will be taken in order to examine the computability of consciousness. Because consciousness is a phenomenon experienced by an observer, representation of consciousness as a computational process will be attempted and its computability will be examined.


Phase Transition for Random Quantified XOR-Formulas

Journal of Artificial Intelligence Research

The QXOR-SAT problem is the quantified version of the satisfiability problem XOR-SAT in which the connective exclusive-or is used instead of the usual or. We study the phase transition associated with random QXOR-SAT instances. We give a description of this phase transition in the case of one alternation of quantifiers, thus performing an advanced practical and theoretical study on the phase transition of a quantified problem.


Sufficient conditions for convergence of the Sum-Product Algorithm

arXiv.org Artificial Intelligence

HE Sum-Product Algorithm [2], also known as Loopy Belief Propagation, which we will henceforth abbreviate as LBP, is a popular algorithm for approximate inference on graphical models. Applications can be found in diverse areas such as error correcting codes (iterative channel decoding algorithms for Turbo Codes and Low Density Parity Check Codes [3]), combinatorial optimization (satisfiability problems such as 3-SAT and graph coloring [4]) and computer vision (stereo matching [5] and image restoration [6]). LBP can be regarded as the most elementary one in a family of related algorithms, consisting of double-loop algorithms [7], GBP [8], EP [9], EC [10], the Max-Product Algorithm [11], the Survey Propagation Algorithm [4], [12] and Fractional BP [13]. A good understanding of LBP may therefore be beneficial to understanding these other algorithms as well. In practice, there are two major obstacles in the application of LBP to concrete problems: (i) if LBP converges, it is not clear whether the results are a good approximation of the exact marginals; (ii) LBP does not always converge, and in these cases gives no approximations at all. These two issues might actually be interrelated: the "folklore" is that failure of LBP to converge often indicates low quality of the Bethe approximation on which it is based. This would mean that if one has to "force" LBP to converge (e.g. by using damping or double-loop approaches), one may expect the results to be of low quality. Although LBP is an old algorithm that has been reinvented in many fields, a thorough theoretical understanding of the two aforementioned issues and their relation is still lacking. Significant progress has been made in recent years regarding the question under what conditions LBP converges (e.g.