Country
Phase Transitions and Backbones of the Asymmetric Traveling Salesman Problem
In recent years, there has been much interest in phase transitions of combinatorial problems. Phase transitions have been successfully used to analyze combinatorial optimization problems, characterize their typical-case features and locate the hardest problem instances. In this paper, we study phase transitions of the asymmetric Traveling Salesman Problem (ATSP), an NP-hard combinatorial optimization problem that has many real-world applications. Using random instances of up to 1,500 cities in which intercity distances are uniformly distributed, we empirically show that many properties of the problem, including the optimal tour cost and backbone size, experience sharp transitions as the precision of intercity distances increases across a critical value. Our experimental results on the costs of the ATSP tours and assignment problem agree with the theoretical result that the asymptotic cost of assignment problem is pi ^2 /6 the number of cities goes to infinity. In addition, we show that the average computational cost of the well-known branch-and-bound subtour elimination algorithm for the problem also exhibits a thrashing behavior, transitioning from easy to difficult as the distance precision increases. These results answer positively an open question regarding the existence of phase transitions in the ATSP, and provide guidance on how difficult ATSP problem instances should be generated.
Distribution of Mutual Information from Complete and Incomplete Data
Hutter, Marcus, Zaffalon, Marco
Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(1/n^3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.
Applications of Case-Based Reasoning in Molecular Biology
Jurisica, Igor, Glasgow, Janice
Thus, one of the primary goals of a CBR system is to find the most similar, or most relevant, cases for new input problems. The effectiveness of CBR depends on the quality and quantity of cases in a case base. In some domains, even a small number of cases provide good solutions, but in other domains, an increased number of unique cases improves problemsolving capabilities of CBR systems because there are more experiences to draw on. The reader can find detailed complete theories, and rapid evolution; reasoning descriptions of the CBR process and systems in is often based on experience rather Kolodner (1993). Experts remember are presented in Leake (1996), and practically positive experiences for possible reuse of solutions; negative experiences are used to avoid oriented descriptions of CBR can be potentially unsuccessful outcomes.
AI and Bioinformatics
Glasgow, Janice, Jurisica, Igor, Rost, Burkhard
Undoubtedly, bioinformatics is Michael Waddell, David Page, and Jude a truly interdisciplinary field: Although some Shavlik ("Using Machine Learning to Design researchers continuously affect wet labs in life and Interpret Gene-Expression Microarrays") science through collaborations or provision of introduces some background information and tools, others are rooted in the theory departments provides a comprehensive description of how of exact sciences (physics, chemistry, or techniques from machine learning can be used engineering) or computer sciences. This wide to help understand this high-dimensional and variety creates many different perspectives and prolific gene-expression data.
Annotating Protein Function through Lexical Analysis
We now know the full genomes of more than 60 organisms. The experimental characterization of the newly sequenced proteins is deemed to lack behind this explosion of naked sequences (sequencefunction gap). The rate at which expert annotators add the experimental information into more or less controlled vocabularies of databases snails along at an even slower pace. Most methods that annotate protein function exploit sequence similarity by transferring experimental information for homologues. A crucial development aiding such transfer is large-scale, work- and management-intensive projects aimed at developing a comprehensive ontology for gene-protein function, such as the Gene Ontology project. In parallel, fully automatic or semiautomatic methods have successfully begun to mine the existing data through lexical analysis. Some of these tools target parsing controlled vocabulary from databases; others venture at mining free texts from MEDLINE abstracts or full scientific papers. Automated text analysis has become a rapidly expanding discipline in bioinformatics. A few of these tools have already been embedded in research projects.
Toward Automated Discovery in the Biological Sciences
Buchanan, Bruce G., Livingston, Gary R.
Knowledge discovery programs in the biological sciences require flexibility in the use of symbolic data and semantic information. Because of the volume of nonnumeric, as well as numeric, data, the programs must be able to explore a large space of possibly interesting relationships to discover those that are novel and interesting. Thus, the framework for the discovery program must facilitate proposing and selecting the next task to perform and performing the selected tasks. The framework we describe, called the agenda- and justificationbased framework, has several properties that are desirable in semiautonomous discovery systems: It provides a mechanism for estimating the plausibility of tasks, it uses heuristics to propose and perform tasks, and it facilitates the encoding of general discovery strategies and the use of background knowledge. We have implemented the framework and our heuristics in a prototype program, HAMB, and have evaluated them in the domain of protein crystallization. Our results demonstrate that both reasons given for performing tasks and estimates of the interestingness of the concepts and hypotheses examined by HAMB contribute to its performance and that the program can discover novel, interesting relationships in biological data.
Representation of Protein-Sequence Information by Amino Acid Subalphabets
Andersen, Claus A. F., Brunak, Soren
Within computational biology, algorithms are constructed with the aim of extracting knowledge from biological data, in particular, data generated by the large genome projects, where gene and protein sequences are produced in high volume. In this article, we explore new ways of representing protein-sequence information, using machine learning strategies, where the primary goal is the discovery of novel powerful representations for use in AI techniques. In the case of proteins and the 20 different amino acids they typically contain, it is also a secondary goal to discover how the current selection of amino acids -- which now are common in proteins -- might have emerged from simpler selections, or alphabets, in use earlier during the evolution of living organisms.
Calendar of Events
NASA Ames Research Center Polish Academy of Sciences URL: www.taai.org.tw/announce/ (PRICAI 2004). (ICKEDS 2004). This book looks at some of the results of the synergy among AI, cognitive science, and education. Examples include virtual students whose misconceptions force students to reflect on their own knowledge, intelligent tutoring systems, and speech recognition technology that helps students learn to read.
Report on the Second International Joint Conference on Autonomous Agents and Multiagent Systems
Rosenschein, Jeffrey S., Wooldridge, Michael
The Second International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-03) was held in Melbourne, Australia, in July 2003. Attracting nearly 500 delegates, the event confirmed AAMAS as the academic main event for researchers with an interest in multiagent systems. We summarize the conference highlights and report on the associated workshops, tutorials, and emerging trends.
Applying Inductive Logic Programming to Predicting Gene Function
One of the fastest advancing areas of modern science is functional genomics. This science seeks to understand how the complete complement of molecular components of living organisms (nucleic acid, protein, small molecules, and so on) interact together to form living organisms. Functional genomics is of interest to AI because the relationship between machines and living organisms is central to AI and because the field is an instructive and fun domain to apply and sharpen AI tools and ideas, requiring complex knowledge representation, reasoning, learning, and so on. This article describes two machine learning (inductive logic programming [ILP])-based approaches to the bioinformatic problem of predicting protein function from amino acid sequence. The first approach is based on using ILP as a way of bootstrapping from conventional sequence-based homology methods. The second approach used protein-functional ontologies to provide function classes and a hybrid ILP method to predict function directly from sequence. Both ILP approaches were successful in producing accurate prediction rules that could biologically be interpreted. The work was also of interest to machine learning research because it highlighted the flexibility of ILP systems in dealing with heterogeneous data, the importance of problems where classes are related hierarchically, and problems where examples have more than one functional class.