Goto

Collaborating Authors

 Statistical Learning


Predicting Structural and Functional Sites in Proteins by Searching for Maximum-weight Cliques

AAAI Conferences

Fully characterizing structural and functional sites in proteins is a fundamental step in understanding their roles in the cell. This extremely challenging combinatorial problem requires determining the number of sites in the protein and the set of residues involved in each of them. We formulate it as a distance-based supervised clustering task, where training proteins are employed to learn a proper distance function between residues. A partial clustering is then returned by searching for maximum-weight cliques in the resulting weighted graph representation of proteins. A novel stochastic local search algorithm is proposed to efficiently generate approximate solutions. Our method achieves substantial improvements over a previous structured-output approach for metal binding site prediction. Significant improvements over the current state-of-the-art are also achieved in predicting catalytic sites from 3D structure in enzymes.


Automatic Attribution of Quoted Speech in Literary Narrative

AAAI Conferences

We describe a method for identifying the speakers of quoted speech in natural-language textual stories. We have assembled a corpus of more than 3,000 quotations, whose speakers (if any) are manually identified, from a collection of 19th and 20th century literature by six authors. Using rule-based and statistical learning, our method identifies candidate characters, determines their genders, and attributes each quote to the most likely speaker. We divide the quotes into syntactic classes in order to leverage common discourse patterns, which enable rapid attribution for many quotes. We apply learning algorithms to the remainder and achieve an overall accuracy of 83%.


A Decentralised Coordination Algorithm for Mobile Sensors

AAAI Conferences

We present an on-line decentralised algorithm for coordinating mobile sensors for a broad class of information gathering tasks. These sensors can be deployed in unknown and possibly hostile environments, where uncertainty and dynamism are endemic. Such environments are common in the areas of disaster response and military surveillance. Our coordination approach itself is based on work by Stranders et al. (2009), that uses the max-sum algorithm to coordinate mobile sensors for monitoring spatial phenomena. In particular, we generalise and extend their approach to any domain where measurements can be valued. Also, we introduce a clustering approach that allows sensors to negotiate over paths to the most relevant locations, as opposed to a set of fixed directions, which results in a significantly improved performance. We demonstrate our algorithm by applying it to two challenging and distinct information gathering tasks. In the first–pursuit-evasion (PE)–sensors need to capture a target whose movement might be unknown. In the second–patrolling (P)–sensors need to minimise loss from intrusions that occur within their environment. In doing so, we obtain the first decentralised coordination algorithms for these domains. Finally, in each domain, we empirically evaluate our approach in a simulated environment, and show that it outperforms two state of the art greedy algorithms by 30% (PE) and 44% (P), and an existing approach based on the Travelling Salesman Problem by 52% (PE) and 30% (P).


Efficient Spectral Feature Selection with Minimum Redundancy

AAAI Conferences

Spectral feature selection identifies relevant features by measuring their capability of preserving sample similarity. It provides a powerful framework for both supervised and unsupervised feature selection, and has been proven to be effective in many real-world applications. One common drawback associated with most existing spectral feature selection algorithms is that they evaluate features individually and cannot identify redundant features. Since redundant features can have significant adverse effect on learning performance, it is necessary to address this limitation for spectral feature selection. To this end, we propose a novel spectral feature selection algorithm to handle feature redundancy, adopting an embedded model. The algorithm is derived from a formulation based on a sparse multi-output regression with a L 2,1 -norm constraint. We conduct theoretical analysis on the properties of its optimal solutions, paving the way for designing an efficient path-following solver. Extensive experiments show that the proposed algorithm can do well in both selecting relevant features and removing redundancy.


Interactive Learning Using Manifold Geometry

AAAI Conferences

We present an interactive learning method that enables a user to iteratively refine a regression model. The user examines the output of the model, visualized as the vertical axis of a 2D scatterplot, and provides corrections by repositioning individual data instances to the correct output level. Each repositioned data instance acts as a control point for altering the learned model, using the geometry underlying the data. We capture the underlying structure of the data as a manifold, on which we compute a set of basis functions as the foundation for learning. Our results show that manifold-based interactive learning improves performance monotonically with each correction, outperforming alternative approaches.


Assisting Users with Clustering Tasks by Combining Metric Learning and Classification

AAAI Conferences

Interactive clustering refers to situations in which a human labeler is willing to assist a learning algorithm in automatically clustering items. We present a related but somewhat different task, assisted clustering, in which a user creates explicit groups of items from a large set and wants suggestions on what items to add to each group. While the traditional approach to interactive clustering has been to use metric learning to induce a distance metric, our situation seems equally amenable to classification. Using clusterings of documents from human subjects, we found that one or the other method proved to be superior for a given cluster, but not uniformly so. We thus developed a hybrid mechanism for combining the metric learner and the classifier. We present results from a large number of trials based on human clusterings, in which we show that our combination scheme matches and often exceeds the performance of a method which exclusively uses either type of learner.


A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery

AAAI Conferences

Labor monitoring is crucial in modern health care, as it can be used to detect (and help avoid) significant problems with the fetus. In this paper we focus on hypoxia (or oxygen deprivation), a very serious condition that can arise from different pathologies and can lead to life-long disability and death. We present a novel approach to hypoxia detection based on recordings of the uterine pressure and fetal heart rate, which are routinely monitored during labor. The key idea is to learn models of the fetal response to signals from its environment, using time series data recorded during labor. Then, we use the parameters of these models as attributes in a binary classification problem. A majority vote over several periods is taken to provide the current label for the fetus. We use a unique database of real clinical recordings, both from normal and pathological cases. Our approach classifies correctly more than half the pathological cases, 1.5 hours before delivery. These are cases that were missed by clinicians; early detection of this type would have allowed the physician to perform a Caesarean section, possibly avoiding the negative outcome


Fast, Accurate, and Practical Identity Inference Using TV Remote Controls

AAAI Conferences

Non-invasive identity inference in the home environment is a very challenging problem. A practical solution to the problem could have far reaching implications in many industries, such as home entertainment. In this work, we consider the problem of identity inference using a TV remote control. In particular, we address two challenges that have so far prevented the work of Chang et al. (2009) from being applied in a home entertainment system. First, we show how to learn the patterns of TV remote controls incrementally and online. Second, we generalize our results to partially labeled data. To achieve our goal, we use state-of-the-art methods for max-margin learning and online convex programming. Our solution is efficient, runs in real time, and comes with theoretical guarantees. It performs well in practice and we demonstrate this on 4 datasets of 2 to 4 people.


Estimation of Human Internal Temperature from Wearable Physiological Sensors

AAAI Conferences

Human core body temperature (Tcore) is an important measure of thermal state, e.g., hypo-or hyperthermia, but is difficult to measure using noninvasive wearable sensors. We estimated parameters for a discrete KF model from data collected during several Military training events and from distance runners (n 38). Model performance was evaluated in 25 physically-active subjects who participated in various laboratory and field studies involving exercise of 2-to-8 h duration at ambient temperatures of 20 to 40 C. Overall, the KF model's estimate of Tcore had a root mean square error of 0.30 0.13 ºC from the observed Tcore, and was within 0.5 ºC over 85% of the time. The benefit of the KF approach is that it requires only one input while current state of the art models typically require multiple inputs including individual anthropometrics, metabolic rate, clothing characteristics, and environmental conditions. This state estimation problem in computational physiology illustrates the potential for collaboration between the artificial intelligence and ambulatory physiological monitoring communities. Figure 1: U.S. National Guard Civil Support Team (CST) member engaged in a chemical biological training event.


A Phrase-Based Method for Hierarchical Clustering of Web Snippets

AAAI Conferences

Document clustering has been applied in web information retrieval, which facilitates users’ quick browsing by organizing retrieved results into different groups. Meanwhile, a tree-like hierarchical structure is wellsuited for organizing the retrieved results in favor of web users. In this regard, we introduce a new method for hierarchical clustering of web snippets by exploiting a phrase-based document index. In our method, a hierarchy of web snippets is built based on phrases instead of all snippets, and the snippets are then assigned to the corresponding clusters consisting of phrases. We show that, as opposed to the traditional hierarchical clustering, our method not only presents meaningful cluster labels but also improves clustering performance.