Goto

Collaborating Authors

 Inductive Learning


Zeng

AAAI Conferences

The task of event extraction has long been investigated in a supervised learning paradigm, which is bound by the number and the quality of the training instances. Existing training data must be manually generated through a combination of expert domain knowledge and extensive human involvement. However, due to drastic efforts required in annotating text, the resultant datasets are usually small, which severally affects the quality of the learned model, making it hard to generalize. Our work develops an automatic approach for generating training data for event extraction. Our approach allows us to scale up event extraction training instances from thousands to hundreds of thousands, and it does this at a much lower cost than a manual approach. We achieve this by employing distant supervision to automatically create event annotations from unlabelled text using existing structured knowledge bases or tables.We then develop a neural network model with post inference to transfer the knowledge extracted from structured knowledge bases to automatically annotate typed events with corresponding arguments in text.We evaluate our approach by using the knowledge extracted from Freebase to label texts from Wikipedia articles. Experimental results show that our approach can generate a large number of highquality training instances. We show that this large volume of training data not only leads to a better event extractor, but also allows us to detect multiple typed events.


Feature Engineering for Predictive Modeling Using Reinforcement Learning

AAAI Conferences

Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly captures the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.


Brute-Force Facial Landmark Analysis With a 140,000-Way Classifier

AAAI Conferences

We propose a simple approach to visual alignment, focusing on the illustrative task of facial landmark estimation. While most prior work treats this as a regression problem, we instead formulate it as a discrete K-way classification task, where a classifier is trained to return one of K discrete alignments. One crucial benefit of a classifier is the ability to report back a (softmax) distribution over putative alignments. We demonstrate that this distribution is a rich representation that can be marginalized (to generate uncertainty estimates over groups of landmarks) and conditioned on (to incorporate top-down context, provided by temporal constraints in a video stream or an interactive human user). Such capabilities are difficult to integrate into classic regression-based approaches. We study performance as a function of the number of classes K, including the extreme "exemplar class" setting where K is equal to the number of training examples (140K in our setting). Perhaps surprisingly, we show that classifiers can still be learned in this setting. When compared to prior work in classification, our K is unprecedentedly large, including many "fine-grained" classes that are very similar. We address these issues by using a multi-label loss function that allows for training examples to be non-uniformly shared across discrete classes. We perform a comprehensive experimental analysis of our method on standard benchmarks, demonstrating state-of-the-art results for facial alignment in videos.


Self-Reinforced Cascaded Regression for Face Alignment

AAAI Conferences

Cascaded regression is prevailing in face alignment thanks to its accurate and robust localization of facial landmarks, but typically demands numerous annotated training examples of low discrepancy between shape-indexed features and shape updates. In this paper, we propose a self-reinforced strategy that iteratively expands the quantity and improves the quality of training examples, thus upgrading the performance of cascaded regression itself. The reinforced term evaluates the example quality upon the consistence on both local appearance and global geometry of human faces, and constitutes the example evolution by the philosophy of "survival of the fittest." We train a set of discriminative classifiers, each associated with one landmark label, to prune those examples with inconsistent local appearance, and further validate the geometric relationship among groups of labeled landmarks against the common global geometry derived from a projective invariant. We embed this generic strategy into two typical cascaded regressions, and the alignment results on several benchmark data sets demonstrate the effectiveness of training regressions with automatic example prediction and evolution starting from a small subset.


Discriminative Semi-Supervised Feature Selection via Rescaled Least Squares Regression-Supplement

AAAI Conferences

In this paper, we propose a Discriminative Semi-Supervised Feature Selection (DSSFS) method. In this method, a ε-dragging technique is introduced to the Rescaled Linear Square Regression in order to enlarge the distances between different classes. An iterative method is proposed to simultaneously learn the regression coefficients, ε-draggings matrix and predicting the unknown class labels. Experimental results show the superiority of DSSFS.


Unsupervised Selection of Negative Examples for Grounded Language Learning

AAAI Conferences

There has been substantial work in recent years on grounded language acquisition, in which language and sensor data are used to create a model relating linguistic constructs to the perceivable world. While powerful, this approach is frequently hindered by ambiguities, redundancies, and omissions found in natural language. We describe an unsupervised system that learns language by training visual classifiers, first selecting important terms from object descriptions, then automatically choosing negative examples from a paired corpus of perceptual and linguistic data. We evaluate the effectiveness of each stage as well as the system's performance on the overall learning task.


Scale Up Event Extraction Learning via Automatic Training Data Generation

AAAI Conferences

The task of event extraction has long been investigated in a supervised learning paradigm, which is bound by the number and the quality of the training instances. Existing training data must be manually generated through a combination of expert domain knowledge and extensive human involvement. However, due to drastic efforts required in annotating text, the resultant datasets are usually small, which severally affects the quality of the learned model, making it hard to generalize. Our work develops an automatic approach for generating training data for event extraction. Our approach allows us to scale up event extraction training instances from thousands to hundreds of thousands, and it does this at a much lower cost than a manual approach. We achieve this by employing distant supervision to automatically create event annotations from unlabelled text using existing structured knowledge bases or tables.We then develop a neural network model with post inference to transfer the knowledge extracted from structured knowledge bases to automatically annotate typed events with corresponding arguments in text.We evaluate our approach by using the knowledge extracted from Freebase to label texts from Wikipedia articles. Experimental results show that our approach can generate a large number of highquality training instances. We show that this large volume of training data not only leads to a better event extractor, but also allows us to detect multiple typed events.


Task-Specific Representation Learning for Web-Scale Entity Disambiguation

AAAI Conferences

Named entity disambiguation (NED) is a central problem in information extraction. The goal is to link entities in a knowledge graph (KG) to their mention spans in unstructured text. Each distinct mention span (like John Smith, Jordan or Apache) represents a multi-class classification task. NED can therefore be modeled as a multitask problem with tens of millions of tasks for realistic KGs. We initiate an investigation into neural representations, network architectures, and training protocols for multitask NED. Specifically, we propose a task-sensitive representation learning framework that learns mention dependent representations, followed by a common classifier. Parameter learning in our framework can be decomposed into solving multiple smaller problems involving overlapping groups of tasks. We prove bounds for excess risk, which provide additional insight into the problem of multi-task representation learning. While remaining practical in terms of training memory and time requirements, our approach outperforms recent strong baselines, on four benchmark data sets.


StarSpace: Embed All The Things!

AAAI Conferences

We present StarSpace, a general-purpose neural embedding model that can solve a wide variety of problems: labeling tasks such as text classification,ranking tasks such as information retrieval/web search,collaborative filtering-based  or content-based recommendation,embedding of multi-relational graphs, and learning word, sentence or document level embeddings.In each case the model works by embedding those entities comprised of discrete features and comparing them against each other -- learning similarities dependent on the task.Empirical results on a number of tasks show that StarSpace is highly competitive with existing methods, whilst also being generally applicable to new cases where those methods are not.


Canonical Correlation Inference for Mapping Abstract Scenes to Text

AAAI Conferences

We describe a technique for structured prediction, based on canonical correlation analysis. Our learning algorithm finds two projections for the input and the output spaces that aim at projecting a given input and its correct output into points close to each other. We demonstrate our technique on a language-vision problem, namely the problem of giving a textual description to an "abstract scene".