Goto

Collaborating Authors

 Statistical Learning


Max-Margin Markov Networks

Neural Information Processing Systems

In typical classification tasks, we seek a function which assigns a label to a single object.Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However,many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structurein the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees.



LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

Journal of Artificial Intelligence Research

We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.


Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms

Journal of Artificial Intelligence Research

Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik's basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PAC-Bayesian version of Vapnik's transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space.


Using Machine Learning to Design and Interpret Gene-Expression Microarrays

AI Magazine

Gene-expression microarrays, commonly called gene chips, make it possible to simultaneously measure the rate at which a cell or tissue is expressing -- translating into a protein -- each of its thousands of genes. One can use these comprehensive snapshots of biological activity to infer regulatory pathways in cells; identify novel targets for drug design; and improve the diagnosis, prognosis, and treatment planning for those suffering from disease. However, the amount of data this new technology produces is more than one can manually analyze. Hence, the need for automated analysis of microarray data offers an opportunity for machine learning to have a significant impact on biology and medicine. This article describes microarray technology, the data it produces, and the types of machine learning tasks that naturally arise with these data. It also reviews some of the recent prominent applications of machine learning to gene-chip data, points to related tasks where machine learning might have a further impact on biology and medicine, and describes additional types of interesting data that recent advances in biotechnology allow biomedical researchers to collect.


Evolutionary design of photometric systems and its application to Gaia

arXiv.org Machine Learning

Designing a photometric system to best fulfil a set of scientific goals is a complex task, demanding a compromise between conflicting requirements and subject to various constraints. A specific example is the determination of stellar astrophysical parameters (APs) - effective temperature, metallicity etc. - across a wide range of stellar types. I present a novel approach to this problem which makes minimal assumptions about the required filter system. By considering a filter system as a set of free parameters it may be designed by optimizing some figure-of-merit (FoM) with respect to these parameters. In the example considered, the FoM is a measure of how well the filter system can `separate' stars with different APs. This separation is vectorial in nature, in the sense that the local directions of AP variance are preferably mutually orthogonal to avoid AP degeneracy. The optimization is carried out with an evolutionary algorithm, which uses principles of evolutionary biology to search the parameter space. This model, HFD (Heuristic Filter Design), is applied to the design of photometric systems for the Gaia space astrometry mission. The optimized systems show a number of interesting features, not least the persistence of broad, overlapping filters. These HFD systems perform as least as well as other proposed systems for Gaia, although inadequacies remain in all. The principles underlying HFD are quite generic and may be applied to filter design for numerous other projects, such as the search for specific types of objects or photometric redshift determination.


Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex

Neural Information Processing Systems

How do cortical neurons represent the acoustic environment? This question isoften addressed by probing with simple stimuli such as clicks or tone pips. Such stimuli have the advantage of yielding easily interpreted answers, but have the disadvantage that they may fail to uncover complex or higher-order neuronal response properties. Here we adopt an alternative approach, probing neuronal responses with complex acoustic stimuli, including animal vocalizations and music. We have used in vivo whole cell methods in the rat auditory cortex to record subthreshold membrane potential fluctuations elicited by these stimuli.


A Model for Real-Time Computation in Generic Neural Microcircuits

Neural Information Processing Systems

A key challenge for neural modeling is to explain how a continuous stream of multi-modal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real-time. We propose a new computational model that is based on principles of high dimensional dynamical systems in combination with statistical learning theory. It can be implemented on generic evolved or found recurrent circuitry.


Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex

Neural Information Processing Systems

How do cortical neurons represent the acoustic environment? This question is often addressed by probing with simple stimuli such as clicks or tone pips. Such stimuli have the advantage of yielding easily interpreted answers, but have the disadvantage that they may fail to uncover complex or higher-order neuronal response properties. Here we adopt an alternative approach, probing neuronal responses with complex acoustic stimuli, including animal vocalizations and music. We have used in vivo whole cell methods in the rat auditory cortex to record subthreshold membrane potential fluctuations elicited by these stimuli.


A Model for Real-Time Computation in Generic Neural Microcircuits

Neural Information Processing Systems

A key challenge for neural modeling is to explain how a continuous stream of multi-modal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real-time. We propose a new computational model that is based on principles of high dimensional dynamical systems in combination with statistical learning theory. It can be implemented on generic evolved or found recurrent circuitry.