Goto

Collaborating Authors

 Performance Analysis


High-Dimensional Longitudinal Classification with the Multinomial Fused Lasso

arXiv.org Machine Learning

We study regularized estimation in high-dimensional longitudinal classification problems, using the lasso and fused lasso regularizers. The constructed coefficient estimates are piecewise constant across the time dimension in the longitudinal problem, with adaptively selected change points (break points). We present an efficient algorithm for computing such estimates, based on proximal gradient descent. We apply our proposed technique to a longitudinal data set on Alzheimer's disease from the Cardiovascular Health Study Cognition Study, and use this data set to motivate and demonstrate several practical considerations such as the selection of tuning parameters, and the assessment of model stability.


Understanding Kernel Ridge Regression: Common behaviors from simple functions to density functionals

arXiv.org Machine Learning

Machine learning (ML) is a powerful data-driven method for learning patterns in high-dimensional spaces via induction, and has had widespread success in many fields including more recent applications in quantum chemistry and materials science [1-9]. Here we are interested in applications of ML to construction of density functionals [10-14], which have focused so far on approximating the kinetic energy (KE) of non-interacting electrons. An accurate, general approximation to this could make orbital-free DFT a practical reality. However, ML methods have been developed within the areas of statistics and computer science, and have been applied to a huge variety of data, including neuroscience, image and text processing, and robotics [15]. Thus, they are quite general and have not been tailored to account for specific details of the quantum problem.


MACHINE INTELLIGENCE 13

AI Classics

The two outstanding figures in the history of computer science are Alan Turing and John von Neumann, and they shared the view that logic was the key to understanding and automating computation. In particular, it was Turing who gave us in the mid-1930s the fundamental analysis, and the logical definition, of the concept of'computability by machine' and who discovered the surprising and beautiful basic fact that there exist universal machines which by suitable programming can be made to t This essay is an expanded and revised version of one entitled The Role of Logic in Computer Science and Artificial Intelligence, which was completed in January 1992 (and was later published in the Proceedings of the Fifth Generation computer Systems 1992 Conference). Since completing that essay I have had the benefit of extremely helpful discussions on many of the details with Professor Donald Michie and Professor I. J. Good, both of whom knew Turing well during the war years at Bletchley Park. Professor J. A. N. Lee, whose knowledge of the literature and archives of the history of computing is encyclopedic, also provided additional information, some of which is still unpublished. Further light has very recently been shed on the von Neumann side of the story by Norman Macrae's excellent biography John von Neumann (Macrae 1992). Accordingly, it seemed appropriate to undertake a more complete and thorough version of the FGCS'92 essay, focussing somewhat more on the interesting historical and biographical issues. I am grateful to Donald Michie and Stephen Muggleton for inviting me to contribute such a'second edition' to the present volume, and I would also like to thank the Institute for New Computer Technology (ICOT) for kind permission to make use of the FGCS'92 essay in this way. 1 LOGIC, COMPUTERS, TURING, AND VON NEUMANN


MACHINE INTELLIGENCE 11

AI Classics

In this paper we will be concerned with such reasoning in its most general form, that is, in inferences that are defeasible: given more information, we may retract them. The purpose of this paper is to introduce a form of non-monotonic inference based on the notion of a partial model of the world. We take partial models to reflect our partial knowledge of the true state of affairs. We then define non-monotonic inference as the process of filling in unknown parts of the model with conjectures: statements that could turn out to be false, given more complete knowledge. To take a standard example from default reasoning: since most birds can fly, if Tweety is a bird it is reasonable to assume that she can fly, at least in the absence of any information to the contrary. We thus have some justification for filling in our partial picture of the world with this conjecture. If our knowledge includes the fact that Tweety is an ostrich, then no such justification exists, and the conjecture must be retracted.


6 Integrating AI with Sequence Analysis Richard Lathrop, Teresa Webster, Randall Smith, Patrick Winston & Temple Smith

AI Classics

This chapter will discuss one example of how AI techniques are being integrated with, and extending, existing molecular biology sequence analysis methods. AI ideas of complex representations, pattern recognition, search, and machine learning have been applied to the task of inferring and recognizing structural patterns associated with molecular function. We wish to construct such patterns, and to recognize them in unknown molecules, based on information inferred solely from protein primary (amino acid) sequences.


22 Question Answering BONNIE WEBBER AND NICK WEBB

AI Classics

Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. First, questions do not often translate into a simple list of keywords. For example, the question (1) Which countries did the pope visit in the 1960s? A much more complex set of keywords is needed in order to get anywhere close to the intended result, and experience shows that people will not learn how to formulate and use such sets. Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents. While this is not much of a burden when the answer appears in a snippet and further document access is unnecessary, QA technology aims to move this from being an accidental property of search to its focus. In keyword search and in much work to date on QA technology, the information seeking process has been seen as a one-shot affair: the user asks a question, and the system provides a satisfactory response. However, early work on QA (Section 1.1) did not make this assumption, and newly targeted applications are hindered by it: while a user may try to formulate a question whose answer is the information Question Answering 631 they want, they will not know whether they have succeeded until something has been returned for examination. If what is returned is unsatisfactory or, while not the answer, is still of interest, a user needs to be able to ask further questions that are understood in the context of the previous ones. For these target applications, QA must be part of a collaborative search process (Section 3.3). In the rest of this section, we give some historical background on QA systems (Section 1.1), on dialogue systems in which QA has played a significant role (Section 1.2), and on a particular QA task that has been a major driver of the field over the past 8 years (Section 1.3). Section 2 describes the current state of the art in QA systems, organized around the de facto architecture of such systems. Section 3 discusses some current directions in which QA is moving, including the development of interactive QA.


Scalable Multi-Output Label Prediction: From Classifier Chains to Classifier Trellises

arXiv.org Machine Learning

Multi-output inference tasks, such as multi-label classification, have become increasingly important in recent years. A popular method for multi-label classification is classifier chains, in which the predictions of individual classifiers are cascaded along a chain, thus taking into account inter-label dependencies and improving the overall performance. Several varieties of classifier chain methods have been introduced, and many of them perform very competitively across a wide range of benchmark datasets. However, scalability limitations become apparent on larger datasets when modeling a fully-cascaded chain. In particular, the methods' strategies for discovering and modeling a good chain structure constitutes a mayor computational bottleneck. In this paper, we present the classifier trellis (CT) method for scalable multi-label classification. We compare CT with several recently proposed classifier chain methods to show that it occupies an important niche: it is highly competitive on standard multi-label problems, yet it can also scale up to thousands or even tens of thousands of labels.


A Review of Real-Time Strategy Game AI

AI Magazine

This literature review covers AI techniques used for real-time strategy video games, focusing specifically on StarCraft. It finds that the main areas of current academic research are in tactical and strategic decision-making, plan recognition, and learning, and it outlines the research contributions in each of these areas. The paper then contrasts the use of game AI in academia and industry, finding the academic research heavily focused on creating game-winning agents, while the indus- try aims to maximise player enjoyment. It finds the industry adoption of academic research is low because it is either in- applicable or too time-consuming and risky to implement in a new game, which highlights an area for potential investi- gation: bridging the gap between academia and industry. Fi- nally, the areas of spatial reasoning, multi-scale AI, and co- operation are found to require future work, and standardised evaluation methods are proposed to produce comparable re- sults between studies.


Consistent Classification Algorithms for Multi-class Non-Decomposable Performance Metrics

arXiv.org Machine Learning

We study consistency of learning algorithms for a multi-class performance metric that is a non-decomposable function of the confusion matrix of a classifier and cannot be expressed as a sum of losses on individual data points; examples of such performance metrics include the macro F-measure popular in information retrieval and the G-mean metric used in class-imbalanced problems. While there has been much work in recent years in understanding the consistency properties of learning algorithms for `binary' non-decomposable metrics, little is known either about the form of the optimal classifier for a general multi-class non-decomposable metric, or about how these learning algorithms generalize to the multi-class case. In this paper, we provide a unified framework for analysing a multi-class non-decomposable performance metric, where the problem of finding the optimal classifier for the performance metric is viewed as an optimization problem over the space of all confusion matrices achievable under the given distribution. Using this framework, we show that (under a continuous distribution) the optimal classifier for a multi-class performance metric can be obtained as the solution of a cost-sensitive classification problem, thus generalizing several previous results on specific binary non-decomposable metrics. We then design a consistent learning algorithm for concave multi-class performance metrics that proceeds via a sequence of cost-sensitive classification problems, and can be seen as applying the conditional gradient (CG) optimization method over the space of feasible confusion matrices. To our knowledge, this is the first efficient learning algorithm (whose running time is polynomial in the number of classes) that is consistent for a large family of multi-class non-decomposable metrics. Our consistency proof uses a novel technique based on the convergence analysis of the CG method.


Local Decorrelation For Improved Pedestrian Detection

Neural Information Processing Systems

Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple feature) splits can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art.