Statistical Learning
Gaussian Mixture Modeling with Gaussian Process Latent Variable Models
Nickisch, Hannes, Rasmussen, Carl Edward
Density modeling is notoriously difficult for high dimensional data. One approach to the problem is to search for a lower dimensional manifold which captures the main characteristics of the data. Recently, the Gaussian Process Latent Variable Model (GPLVM) has successfully been used to find low dimensional manifolds in a variety of complex data. The GPLVM consists of a set of points in a low dimensional latent space, and a stochastic map to the observed space. We show how it can be interpreted as a density model in the observed space. However, the GPLVM is not trained as a density model and therefore yields bad density estimates. We propose a new training strategy and obtain improved generalisation performance and better density estimates in comparative evaluations on several benchmark data sets.
Maximum Causal Entropy Correlated Equilibria for Markov Games
Ziebart, Brian D. (Carnegie Mellon University) | Bagnell, Drew (Carnegie Mellon University) | Dey, Anind K. (Carnegie Mellon University)
In this work, we present maximum causal entropy correlated equilibria, a new solution concept that we apply to Markov games. This contribution extends the existing solution concept of maximum entropy correlated equilibria for normal-form games to settings with elements of dynamic interaction with a stochastic environment by employing the recently developed principle of maximum causal entropy. This solution concept is justified for two purposes: as a mechanism for prescribing actions, it reveals the least additional information about the agents' motives possible; and as a predictive estimator of actions for a group of agents assumed to behave according to an unknown correlated equilibrium, it has the fewest additional assumptions and minimizes worst-case action prediction log-loss. Importantly, equilibria for this solution concept are guaranteed to be unique and Markovian, enabling efficient algorithms for finding them.
Envisioning a Robust, Scalable Metacognitive Architecture Built on Dimensionality Reduction
Alonso, Jason Bernardino (Massachusetts Institute of Technology) | Arnold, Kenneth C. (Massachusetts Institute of Technology) | Havasi, Catherine (Massachusetts Institute of Technology)
One major challenge of implementing a metacognitive architecture lies in its scalability and flexibility. We postulate that the difference between a reasoner and a metareasoner need not extend beyond what inputs they take, and we envision a network made of many instances of a few types of simple but powerful reasoning units to serve both roles. In this paper, we present a vision and motivation for such a framework with reusable, robust, and scalable components. This framework, called Scruffy Metacognition , is built on a symbolic representation that lends itself to processing using dimensionality reduction and principal component analysis. We discuss the components of such as system and how they work together for metacognitive reasoning. Additionally, we discuss evaluative tasks for our system focusing on social agent role-playing and object classification.
Learning to Extract Quality Discourse in Online Communities
Brennan, Michael Robert (Drexel University) | Wrazien, Stacy (Drexel University) | Greenstadt, Rachel (Drexel University)
Collaborative filtering systems have been developed to manage information overload and improve discourse in online communities. In such systems, users rank content provided by other users on the validity or usefulness within their particular context. The goal is that "good" content will rise to prominence and "bad" content will fade into obscurity. These filtering mechanisms are not well-understood and have known weaknesses. For example, they depend on the presence of a large crowd to rate content, but such a crowd may not be present. Additionally, the community's decisions determine which voices will reach a large audience and which will be silenced, but it is not known if these decisions represent "the wisdom of crowds" or a "censoring mob." Our approach uses statistical machine learning to predict community ratings. By extracting features that replicate the community's verdict, we can better understand collaborative filtering, improve the way the community uses the ratings of their members, and design agents that augment community decision-making. Slashdot is an example of such a community where peers will rate each others' comments based on their relevance to the post. This work extracts a wide variety of features from the Slashdot metadata and posts' linguistic contents to identify features that can predict the community rating. We find that author reputation, use of pronouns, and author sentiment are salient. We achieve 76% accuracy predicting community ratings as good, neutral, or bad.
Reducing the Dimensionality of Data Streams using Common Sense
Havasi, Catherine (Massachusetts Institute of Technology) | Alonso, Jason (Massachusetts Institute of Technology) | Speer, Robert (Massachusetts Institute of Technology)
Increasingly, we need to computationally understand real-time streams of information in places such as news feeds, speech streams, and social networks. We present Streaming AnalogySpace, an efficient technique that discovers correlations in and makes predictions about sparse natural-language data that arrives in a real-time stream. AnalogySpace is a noise-resistant PCA-based inference technique designed for use with collaboratively collected common sense knowledge and semantic networks. Streaming AnalogySpace advances this work by computing it incrementally using CCIPCA, and keeping a dense cache of recently-used features to efficiently represent a sparse and open domain. We show that Streaming AnalogySpace converges to the results of standard AnalogySpace, and verify this by evaluating its accuracy empirically on common-sense predictions against standard AnalogySpace.
Application of Data Mining to Network Intrusion Detection: Classifier Selection Model
As network attacks have increased in number and severity over the past few years, intrusion detection system (IDS) is increasingly becoming a critical component to secure the network. Due to large volumes of security audit data as well as complex and dynamic properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem that is receiving more and more attention from the research community. The uncertainty to explore if certain algorithms perform better for certain attack classes constitutes the motivation for the reported herein. In this paper, we evaluate performance of a comprehensive set of classifier algorithms using KDD99 dataset. Based on evaluation results, best algorithms for each attack category is chosen and two classifier algorithm selection models are proposed. The simulation result comparison indicates that noticeable performance improvement and real-time intrusion detection can be achieved as we apply the proposed models to detect different kinds of network attacks.
Clustering Stability: An Overview
A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for non-experts. In this paper we give a high-level overview about the existing literature on clustering stability. In addition to presenting the results in a slightly informal but accessible way, we relate them to each other and discuss their different implications.
Euclidean Distances, soft and spectral Clustering on Weighted Graphs
We define a class of Euclidean distances on weighted graphs, enabling to perform thermodynamic soft graph clustering. The class can be constructed form the "raw coordinates" encountered in spectral clustering, and can be extended by means of higher-dimensional embeddings (Schoenberg transformations). Geographical flow data, properly conditioned, illustrate the procedure as well as visualization aspects.
Risk bounds in linear regression through PAC-Bayesian truncation
Audibert, Jean-Yves, Catoni, Olivier
We consider the problem of predicting as well as the best linear combination of d given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. When the input distribution is known, there already exists an algorithm having an expected excess risk of order d/n, where n is the size of the training data. Without this strong assumption, standard results often contain a multiplicative log n factor, and require some additional assumptions like uniform boundedness of the d-dimensional input representation and exponential moments of the output. This work provides new risk bounds for the ridge estimator and the ordinary least squares estimator, and their variants. It also provides shrinkage procedures with convergence rate d/n (i.e., without the logarithmic factor) in expectation and in deviations, under various assumptions. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Finally, we show that some of these results are not particular to the least squares loss, but can be generalized to similar strongly convex loss functions.
Discovering Graphical Granger Causality Using the Truncating Lasso Penalty
Shojaie, Ali, Michailidis, George
Components of biological systems interact with each other in order to carry out vital cell functions. Such information can be used to improve estimation and inference, and to obtain better insights into the underlying cellular mechanisms. Discovering regulatory interactions among genes is therefore an important problem in systems biology. Whole-genome expression data over time provides an opportunity to determine how the expression levels of genes are affected by changes in transcription levels of other genes, and can therefore be used to discover regulatory interactions among genes. In this paper, we propose a novel penalization method, called truncating lasso, for estimation of causal relationships from time-course gene expression data. The proposed penalty can correctly determine the order of the underlying time series, and improves the performance of the lasso-type estimators. Moreover, the resulting estimate provides information on the time lag between activation of transcription factors and their effects on regulated genes. We provide an efficient algorithm for estimation of model parameters, and show that the proposed method can consistently discover causal relationships in the large $p$, small $n$ setting. The performance of the proposed model is evaluated favorably in simulated, as well as real, data examples. The proposed truncating lasso method is implemented in the R-package grangerTlasso and is available at http://www.stat.lsa.umich.edu/~shojaie.