AITopics | Technology

Collaborating Authors

Technology

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

News Overviews Instructional Materials AI-Alerts Classics

The Google Similarity Distance

Cilibrasi, Rudi, Vitanyi, Paul M. B.

arXiv.org Artificial IntelligenceMay-30-2007

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.

machine learning, natural language, search term, (20 more...)

arXiv.org Artificial Intelligence

cs/0412098

Country:

North America > United States (0.46)
Europe > Netherlands (0.28)
North America > Canada > Alberta (0.28)

Genre:

Research Report > Experimental Study (0.48)
Research Report > Strength High (0.34)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

Truecluster matching

Oehlschlägel, Jens

arXiv.org Artificial IntelligenceMay-29-2007

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a chi-square transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

0705.4302

Country: Europe (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Consensus Propagation

Moallemi, Ciamac C., Van Roy, Benjamin

arXiv.org Artificial IntelligenceMay-29-2007

We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge.

artificial intelligence, machine learning, propagation, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIT.2006.883539

cs/0603078

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Information Technology > Communications > Networks > Sensor Networks (0.46)

Add feedback

Truecluster: robust scalable clustering with model selection

Oehlschlägel, Jens

arXiv.org Artificial IntelligenceMay-28-2007

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection, robustness, and scalability to large datasets. We consider the important problem of deciding on the optimal number of clusters, given an arbitrary definition of space and clusteriness. We show how to construct a cluster information criterion that allows objective model selection. Differing from other approaches, our truecluster method does not require specific assumptions about underlying distributions, dissimilarity definitions or cluster models. Truecluster puts arbitrary clustering algorithms into a generic unified (sampling-based) statistical framework. It is scalable to big datasets and provides robust cluster assignments and case-wise diagnostics. Truecluster will make clustering more objective, allows for automation, and will save time and costs. Free R software is available.

artificial intelligence, base cluster algorithm, machine learning, (15 more...)

arXiv.org Artificial Intelligence

cs/0601001

Country:

North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Undercomplete Blind Subspace Deconvolution

Szabo, Zoltan, Poczos, Barnabas, Lorincz, Andras

arXiv.org Machine LearningMay-20-2007

We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The associated `high dimensional' ISA problem can be handled by a recent technique called joint f-decorrelation (JFD). Similar decorrelation methods have been used previously for kernel independent component analysis (kernel-ICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decorrelation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be tackled, and (ii) explore the working and advantages of the derived kernel-ISA methods.

artificial intelligence, isa task, machine learning, (17 more...)

arXiv.org Machine Learning

math/0701210

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Networks (0.93)

Add feedback

Using Genetic Algorithms to Optimise Rough Set Partition Sizes for HIV Data Analysis

Crossingham, Bodie, Marwala, Tshilidzi

arXiv.org Artificial IntelligenceMay-17-2007

In this paper, we present a method to optimise rough set partition sizes, to which rule extraction is performed on HIV data. The genetic algorithm optimisation technique is used to determine the partition sizes of a rough set in order to maximise the rough sets prediction accuracy. The proposed method is tested on a set of demographic properties of individuals obtained from the South African antenatal survey. Six demographic variables were used in the analysis, these variables are; race, age of mother, education, gravidity, parity, and age of father, with the outcome or decision being either HIV positive or negative. Rough set theory is chosen based on the fact that it is easy to interpret the extracted rules. The prediction accuracy of equal width bin partitioning is 57.7% while the accuracy achieved after optimising the partitions is 72.8%. Several other methods have been used to analyse the HIV data and their results are stated and compared to that of rough set theory (RST).

accuracy, evolutionary algorithm, machine learning, (14 more...)

arXiv.org Artificial Intelligence

0705.2485

Country:

Africa (0.29)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Lasso type classifiers with a reject option

Wegkamp, Marten

arXiv.org Machine LearningMay-16-2007

A discriminant function f: X R yields a classifier sgn(f(x)) { 1, 1} that represents our guess of the label Y of a future observation X and we err if the margin y · f(x) 0. Since observations x for which the conditional probability η(x) P{Y 1 X x} (1) is close to 1/2 are difficult to classify, we introduce a reject option for classifiers, by allowing for a third decision, R (reject), expressing doubt. We built in the reject option by using a threshold value 0 τ 1 as follows. Given a discriminant function f: X R, we report sgn(f(x)) { 1,1} if f(x) τ, but we withhold decision if f(x) τ and report R. We assume that the cost of making a wrong decision is 1 and the cost of utilizing the reject option is d 0. The appropriate risk function is then E[l(Yf(X))] P{Yf(X) τ} dP{ Yf(X) τ} (2) Research is supported in part by NSF grant DMS 0706829 155 M. Wegkamp/Lasso type classifiers with a reject option 156

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

doi: 10.1214/07-EJS058

0705.2363

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Non-Computability of Consciousness

Song, Daegene

arXiv.org Artificial IntelligenceMay-11-2007

Research in the field of artificial intelligence, which attempts to imitate and simulate intelligent activities using a machine, has blossomed along with the development of information technology [1]. Because the study of artificial intelligence has provided many insights into intelligent behaviors such as pattern recognition, decision theory, etc., there is a question whether consciousness or self-awareness could emerge out of a computational system, a view termed as strong artificial intelligence. This question can be rephrased and stated as follows: Are all conscious activities computational processes?. In this paper, the answer to this question is shown to be no. In order to examine the computability of a physical phenomenon, the phenomenon should first be represented as a computational model; subsequently, the computability of this particular model can be examined. The physical phenomenon can then be claimed to be computable or not based on this examination. A similar approach will be taken in order to examine the computability of consciousness. Because consciousness is a phenomenon experienced by an observer, representation of consciousness as a computational process will be attempted and its computability will be examined.

artificial intelligence, machine learning, observer, (17 more...)

arXiv.org Artificial Intelligence

0705.1617

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.48)
Information Technology > Artificial Intelligence > Cognitive Science (0.34)

Add feedback

Phase Transition for Random Quantified XOR-Formulas

Creignou, N., Daude, H., Egly, U.

Journal of Artificial Intelligence ResearchMay-10-2007

The QXOR-SAT problem is the quantified version of the satisfiability problem XOR-SAT in which the connective exclusive-or is used instead of the usual or. We study the phase transition associated with random QXOR-SAT instances. We give a description of this phase transition in the case of one alternation of quantifiers, thus performing an advanced practical and theoretical study on the phase transition of a quantified problem.

formula, phase transition, transition, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2120

AI Access Foundation

10496

Journal of Artificial Intelligence Research

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Africa > Sudan (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Austria (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

Add feedback

Sufficient conditions for convergence of the Sum-Product Algorithm

Mooij, Joris M., Kappen, Hilbert J.

arXiv.org Artificial IntelligenceMay-8-2007

HE Sum-Product Algorithm [2], also known as Loopy Belief Propagation, which we will henceforth abbreviate as LBP, is a popular algorithm for approximate inference on graphical models. Applications can be found in diverse areas such as error correcting codes (iterative channel decoding algorithms for Turbo Codes and Low Density Parity Check Codes [3]), combinatorial optimization (satisfiability problems such as 3-SAT and graph coloring [4]) and computer vision (stereo matching [5] and image restoration [6]). LBP can be regarded as the most elementary one in a family of related algorithms, consisting of double-loop algorithms [7], GBP [8], EP [9], EC [10], the Max-Product Algorithm [11], the Survey Propagation Algorithm [4], [12] and Fractional BP [13]. A good understanding of LBP may therefore be beneficial to understanding these other algorithms as well. In practice, there are two major obstacles in the application of LBP to concrete problems: (i) if LBP converges, it is not clear whether the results are a good approximation of the exact marginals; (ii) LBP does not always converge, and in these cases gives no approximations at all. These two issues might actually be interrelated: the "folklore" is that failure of LBP to converge often indicates low quality of the Bethe approximation on which it is based. This would mean that if one has to "force" LBP to converge (e.g. by using damping or double-loop approaches), one may expect the results to be of low quality. Although LBP is an old algorithm that has been reinvented in many fields, a thorough theoretical understanding of the two aforementioned issues and their relation is still lacking. Significant progress has been made in recent years regarding the question under what conditions LBP converges (e.g.

artificial intelligence, convergence, sufficient condition, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIT.2007.909166

cs/0504030

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)

Add feedback