Goto

Collaborating Authors

 Inductive Learning


Rates for Inductive Learning of Compositional Models

AAAI Conferences

Compositional Models are widely used in Computer Vision as they exhibit strong expressive power by generating a combinatorial number of configurations with a small number of components. However, the literature is still missing a theoretical understanding of why compositional models are better than flat representations, despite empirical evidence as well as strong arguments that compositional models need fewer training examples. In this paper we try to give some theoretical answers in this direction, focusing on AND/OR Graph (AOG) models used in recent literature for representing objects, scenes and events, and bringing the following contributions. First, we analyze the capacity of the space of AND/OR graphs, obtaining PAC (Probably Approximately Correct) bounds for the number of training examples sufficient to guarantee with a given certainty that the model learned has a given accuracy. Second, we propose an algorithm for supervised learning AND/OR Graphs that has theoretical performance guarantees based on the dimensionality and number of training examples. Finally, we observe that part localization, part noise tolerance and part sharing leads to a reduction in the number of training examples required.


Using Commonsense Knowledge to Automatically Create (Noisy) Training Examples from Text

AAAI Conferences

One of the challenges to information extraction is the requirement of human annotated examples. Current successful approaches alleviate this problem by employing some form of distant supervision i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand coded background knowledge that explicitly looks for patterns in text. In this work, we take a different approach -- we create weakly supervised examples for relations by using commonsense knowledge. The key innovation is that this commonsense knowledge is completely independent of the natural language text. This helps when learning the full model for information extraction as against simply learning the parameters of a known CRF or MLN. We demonstrate on two domains that this form of weak supervision yields superior results when learning structure compared to simply using the gold standard labels.


Teaching Classification Boundaries to Humans

AAAI Conferences

Given a classification task, what is the best way to teach the resulting boundary to a human? While machine learning techniques can provide excellent methods for finding the boundary, including the selection of examples in an online setting, they tell us little about how we would teach a human the same task. We propose to investigate the problem of example selection and presentation in the context of teaching humans, and explore a variety of mechanisms in the interests of finding what may work best. In particular, we begin with the baseline of random presentation and then examine combinations of several mechanisms: the indication of an exampleโ€™s relative difficulty, the use of the shaping heuristic from the cognitive science literature (moving from easier examples to harder ones), and a novel kernel-based โ€œcoverage modelโ€ of the subjectโ€™s mastery of the task. From our experiments on 54 human subjects learning and performing a pair of synthetic classification tasks via our teaching system, we found that we can achieve the greatest gains with a combination of shaping and the coverage model.


Physical Activity Recognition from Accelerometer Data Using a Multi-Scale Ensemble Method

AAAI Conferences

Accurate and detailed measurement of an individual's physical activity is a key requirement for helping researchers understand the relationship between physical activity and health. Accelerometers have become the method of choice for measuring physical activity due to their small size, low cost, convenience and their ability to provide objective information about physical activity. However, interpreting accelerometer data once it has been collected can be challenging. In this work, we applied machine learning algorithms to the task of physical activity recognition from triaxial accelerometer data. We employed a simple but effective approach of dividing the accelerometer data into short non-overlapping windows, converting each window into a feature vector, and treating each feature vector as an i.i.d training instance for a supervised learning algorithm. In addition, we improved on this simple approach with a multi-scale ensemble method that did not need to commit to a single window size and was able to leverage the fact that physical activities produced time series with repetitive patterns and discriminative features for physical activity occurred at different temporal scales.


Assessing the Predictability of Hospital Readmission Using Machine Learning

AAAI Conferences

Unplanned hospital readmissions raise health care costs and cause significant distress to patients. Hence, predicting which patients are at risk to be readmitted is of great interest. In this paper, we mine large amounts of administrative information from claim data, including patients demographics, dispensed drugs, medical or surgical procedures performed, and medical diagnosis, in order to predict readmission using supervised learning methods. Our objective is to gain knowledge about the predictive power of the available information. Our preliminary results on data from the provincial hospital system in Quebec illustrate the potential for this approach to reveal important information on factors that trigger hospital readmission. Our findings suggest that a substantial portion of readmissions is inherently hard to predict. Consequently, the use of the raw readmission rate as an indicator of the quality of provided care might not be appropriate.


An Antimicrobial Prescription Surveillance System that Learns from Experience

AAAI Conferences

Inappropriate prescribing of antimicrobials is a major clinical and health concern, as well as a financial burden, in hospitals worldwide. In this paper, we describe a deployed automated antimicrobial prescription surveillance system that has been assisting hospital pharmacists in identifying and reporting inappropriate antimicrobial prescriptions. One of the key characteristics of this system is its ability to learn new rules for detecting inappropriate prescriptions based on previous false alerts. The supervised learning algorithm combines instance-based learning and rule induction techniques. It exploits temporal abstraction to extract a meaningful time interval representation from raw clinical data, and applies nearest neighbor classification with a distance function on both temporal and non-temporal parameters. The learning capability is valuable both in configuring the system for initial deployment and improving its long term use. We give an overview of the application, point to lessons learned so far and provide insight into the machine learning capability.


Imbalanced Multiple Noisy Labeling for Supervised Learning

AAAI Conferences

When labeling objects via Internet-based outsourcing systems, the labelers may have bias, because they lack expertise, dedication and personal preference. These reasons cause Imbalanced Multiple Noisy Labeling. To deal with the imbalance labeling issue, we propose an agnostic algorithm PLAT (Positive LAbel frequency Threshold) which does not need any information about quality of labelers and underlying class distribution. Simulations on eight real-world datasets with different underlying class distributions demonstrate that PLAT not only effectively deals with the imbalanced multiple noisy labeling problem that off-the-shelf agnostic methods cannot cope with, but also performs nearly the same as majority voting under the circumstances that labelers have no bias.


Continuous Conditional Random Fields for Efficient Regression in Large Fully Connected Graphs

AAAI Conferences

When used for structured regression, powerful Conditional Random Fields (CRFs) are typically restricted to modeling effects of interactions among examples in local neighborhoods. Using more expressive representation would result in dense graphs, making these methods impractical for large-scale applications. To address this issue, we propose an effective CRF model with linear scale-up properties regarding approximate learning and inference for structured regression on large, fully connected graphs. The proposed method is validated on real-world large-scale problems of image de-noising and remote sensing. In conducted experiments, we demonstrated that dense connectivity provides an improvement in prediction accuracy. Inference time of less than ten seconds on graphs with millions of nodes and trillions of edges makes the proposed model an attractive tool for large-scale, structured regression problems.


Integrating Programming by Example and Natural Language Programming

AAAI Conferences

We motivate the integration of programming by example and natural language programming by developing a system for specifying programs for simple text editing operations based on regular expressions. The programs are described with unconstrained natural language instructions, and providing one or more examples of input/output. We show that natural language allows the system to deduce the correct program much more often and much faster than is possible with the input/output example(s) alone, showing that natural language programming and programming by example can be combined in a way that overcomes the ambiguities that both methods suffer from individually, while providing a more natural interface to the user.


HC-Search: Learning Heuristics and Cost Functions for Structured Prediction

AAAI Conferences

Structured prediction is the problem of learning a function from structured inputs to structured outputs with prototypical examples being part-of-speech tagging and image labeling. Inspired by the recent successes of search-based structured prediction, we introduce a new framework for structured prediction called {\em HC-Search}. Given a structured input, the framework uses a search procedure guided by a learned heuristic H to uncover high quality candidate outputs and then uses a separate learned cost function C to select a final prediction among those outputs. We can decompose the regret of the overall approach into the loss due to H not leading to high quality outputs, and the loss due to C not selecting the best among the generated outputs. Guided by this decomposition, we minimize the overall regret in a greedy stage-wise manner by first training H to quickly uncover high quality outputs via imitation learning, and then training C to correctly rank the outputs generated via H according to their true losses. Experiments on several benchmark domains show that our approach significantly outperforms the state-of-the-art methods.