Goto

Collaborating Authors

 Performance Analysis


A ROAD to Classification in High Dimensional Space

arXiv.org Machine Learning

For high-dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due to diverging spectra and noise accumulation. Therefore, researchers proposed independence rules to circumvent the diverse spectra, and sparse independence rules to mitigate the issue of noise accumulation. However, in biological applications, there are often a group of correlated genes responsible for clinical outcomes, and the use of the covariance information can significantly reduce misclassification rates. The extent of such error rate reductions is unveiled by comparing the misclassification rates of the Fisher discriminant rule and the independence rule. To materialize the gain based on finite samples, a Regularized Optimal Affine Discriminant (ROAD) is proposed based on a covariance penalty. ROAD selects an increasing number of features as the penalization relaxes. Further benefits can be achieved when a screening method is employed to narrow the feature pool before hitting the ROAD. An efficient Constrained Coordinate Descent algorithm (CCD) is also developed to solve the associated optimization problems. Sampling properties of oracle type are established. Simulation studies and real data analysis support our theoretical results and demonstrate the advantages of the new classification procedure under a variety of correlation structures. A delicate result on continuous piecewise linear solution path for the ROAD optimization problem at the population level justifies the linear interpolation of the CCD algorithm.


Model Selection in Undirected Graphical Models with the Elastic Net

arXiv.org Machine Learning

Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on $l_1$ regularized neighborhood estimation to include the elastic net $l_1+l_2$ penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates.


Query-time Entity Resolution

arXiv.org Artificial Intelligence

Entity resolution is the problem of reconciling database references corresponding to the same real-world entities. Given the abundance of publicly available databases that have unresolved entities, we motivate the problem of query-time entity resolution quick and accurate resolution for answering queries over such unclean databases at query-time. Since collective entity resolution approaches --- where related references are resolved jointly --- have been shown to be more accurate than independent attribute-based resolution for off-line entity resolution, we focus on developing new algorithms for collective resolution for answering entity resolution queries at query-time. For this purpose, we first formally show that, for collective resolution, precision and recall for individual entities follow a geometric progression as neighbors at increasing distances are considered. Unfolding this progression leads naturally to a two stage expand and resolve query processing strategy. In this strategy, we first extract the related records for a query using two novel expansion operators, and then resolve the extracted records collectively. We then show how the same strategy can be adapted for query-time entity resolution by identifying and resolving only those database references that are the most helpful for processing the query. We validate our approach on two large real-world publication databases where we show the usefulness of collective resolution and at the same time demonstrate the need for adaptive strategies for query processing. We then show how the same queries can be answered in real-time using our adaptive approach while preserving the gains of collective resolution. In addition to experiments on real datasets, we use synthetically generated data to empirically demonstrate the validity of the performance trends predicted by our analysis of collective entity resolution over a wide range of structural characteristics in the data.


Structural Similarity and Distance in Learning

arXiv.org Machine Learning

The notion of distance, or more generally, similarity between observations, is at the root of most learning algorithms such as manifold learning, unsupervised clustering, semi-supervised learning, and anomaly detection. Most methods, at a basic level, are based on some function of Euclidean distance, such as the radial basis functions prevalent in supervised classification. Graphbased learning methods employ Euclidean distances to describe local neighborhoods for observations. K-nearest neighbors and ǫ-neighborhoods based on Euclidean distances are used in manifold learning (Isomap [1] and LLE [2]), spectral clustering [3], anomaly detection algorithms [4], [5] and in label propagation algorithms for semi-supervised learning [6]. In many cases, notably sparsely sampled sets of data, Euclidean neighborhoods are not sufficient to represent underlying structure. Consider data drawn from two independent structures. Ideally, a graph would capture the structure of the data with minimal connection between observations on separate structures.


Learning Probabilistic Behavior Models in Real-Time Strategy Games

AAAI Conferences

We study the problem of learning probabilistic models of high-level strategic behavior in the real-time strategy (RTS) game StarCraft. The models are automatically learned from sets of game logs and aim to capture the common strategic states and decision points that arise in those games. Unlike most work on behavior/strategy learning and prediction in RTS games, our data-centric approach is not biased by or limited to any set of preconceived strategic concepts. Further, since our behavior model is based on the well-developed and generic paradigm of hidden Markov models, it supports a variety of uses for the design of AI players and human assistants. For example, the learned models can be used to make probabilistic predictions of a player's future actions based on observations, to simulate possible future trajectories of a player, or to identify uncharacteristic or novel strategies in a game database. In addition, the learned qualitative structure of the model can be analyzed by humans in order to categorize common strategic elements. We demonstrate our approach by learning models from 331 expert-level games and provide both a qualitative and quantitative assessment of the learned model's utility.


A Discrete Event Calculus Implementation of the OCC Theory of Emotion

AAAI Conferences

Characters are a critical part of storytelling and emotion is a vital part of character. Readers generally credit characters with human emotions, and it is these emotions which bring meaning to stories. To computationally construct interesting and meaningful stories we need a model of emotion which allows us to predict characters’ reactions to events in the world. There are many different psychological theories of emotion; the most popular to date for computational applications is the OCC theory. This paper describes a Discrete Event Calculus implementation of the OCC Theory of Emotion. To evaluate our system, we apply it to a selection of Aesop’s fables, and compare the output to the emotions readers expect in the same situations based on a survey.


Learning Sentence-internal Temporal Relations

arXiv.org Artificial Intelligence

In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after", which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects.


Two Projection Pursuit Algorithms for Machine Learning under Non-Stationarity

arXiv.org Artificial Intelligence

This thesis derives, tests and applies two linear projection algorithms for machine learning under non-stationarity. The first finds a direction in a linear space upon which a data set is maximally non-stationary. The second aims to robustify two-way classification against non-stationarity. The algorithm is tested on a key application scenario, namely Brain Computer Interfacing.


Learning Discriminative Metrics via Generative Models and Kernel Learning

arXiv.org Machine Learning

Metrics specifying distances between data points can be learned in a discriminative manner or from generative models. In this paper, we show how to unify generative and discriminative learning of metrics via a kernel learning framework. Specifically, we learn local metrics optimized from parametric generative models. These are then used as base kernels to construct a global kernel that minimizes a discriminative training criterion. We consider both linear and nonlinear combinations of local metric kernels. Our empirical results show that these combinations significantly improve performance on classification tasks. The proposed learning algorithm is also very efficient, achieving order of magnitude speedup in training time compared to previous discriminative baseline methods.


Data-driven calibration of linear estimators with minimal penalties

arXiv.org Machine Learning

This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression, spline smoothing or locally weighted regression, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty, which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows' $C_L$ penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments with kernel ridge regression and multiple kernel learning show that the proposed algorithm often improves significantly existing calibration procedures such as generalized cross-validation.