Goto

Collaborating Authors

 Statistical Learning


An Easy to Use Repository for Comparing and Improving Machine Learning Algorithm Usage

arXiv.org Machine Learning

The results from most machine learning experiments are used for a specific purpose and then discarded. This results in a significant loss of information and requires rerunning experiments to compare learning algorithms. This also requires implementation of another algorithm for comparison, that may not always be correctly implemented. By storing the results from previous experiments, machine learning algorithms can be compared easily and the knowledge gained from them can be used to improve their performance. The purpose of this work is to provide easy access to previous experimental results for learning and comparison. These stored results are comprehensive -- storing the prediction for each test instance as well as the learning algorithm, hyperparameters, and training set that were used. Previous results are particularly important for meta-learning, which, in a broad sense, is the process of learning from previous machine learning results such that the learning process is improved. While other experiment databases do exist, one of our focuses is on easy access to the data. We provide meta-learning data sets that are ready to be downloaded for meta-learning experiments. In addition, queries to the underlying database can be made if specific information is desired. We also differ from previous experiment databases in that our databases is designed at the instance level, where an instance is an example in a data set. We store the predictions of a learning algorithm trained on a specific training set for each instance in the test set. Data set level information can then be obtained by aggregating the results from the instances. The instance level information can be used for many tasks such as determining the diversity of a classifier or algorithmically determining the optimal subset of training instances for a learning algorithm.


Consistent procedures for cluster tree estimation and pruning

arXiv.org Machine Learning

For a density $f$ on ${\mathbb R}^d$, a {\it high-density cluster} is any connected component of $\{x: f(x) \geq \lambda\}$, for some $\lambda > 0$. The set of all high-density clusters forms a hierarchy called the {\it cluster tree} of $f$. We present two procedures for estimating the cluster tree given samples from $f$. The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the $k$-nearest neighbor graph of the samples. We give finite-sample convergence rates for these algorithms which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient.


Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs

arXiv.org Machine Learning

Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking problem has maximal Fisher information. Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding multigraphs with large algebraic connectivity. This reduction of the data collection problem to graph-theoretic questions is one of the primary contributions of this work. As an application, we study the Yahoo! Movie user rating dataset and demonstrate that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking. As another application, we study the 2011-12 NCAA football schedule and propose schedules with the same number of games which are significantly more informative. Using spectral clustering methods to identify highly-connected communities within the division, we argue that the NCAA could improve its notoriously poor rankings by simply scheduling more out-of-conference games.


Learning Latent Block Structure in Weighted Networks

arXiv.org Machine Learning

Networks are an increasingly important form of structured data consisting of interactions between pairs of individuals in large social and biological data sets. Unlike attribute data where each observation is associated with an individual, network data is represented by graphs, where individuals are vertices and interactions are edges. Because vertices are pairwise related, network data violates traditional assumptions of attribute data, such as independence. This intrinsic difference in structure prompts the development of new tools for handling network data. In social and biological networks, vertices often play distinct structural roles in generating the network's large-scale structure. To identify such latent structural roles, we aim to identify a network partition that groups together vertices with similar group-level connectivity patterns. We call these groups "communities," and their inference produces a compact description of the large-scale 1 (a) Assortative (b) Disassortative (c) Core-Periphery (d) Ordered Figure 1: Examples of structure that can be learned using the SBM. The first row shows the abstract connections between four groups (blue, red, green, and purple). The second row shows the'block' structure found in the adjacency matrix after sorting by group membership; black corresponds to edges and white corresponds to non-edges.


An Ordered Lasso and Sparse Time-Lagged Regression

arXiv.org Machine Learning

We consider regression scenarios where it is natural to impose an order constraint on the coefficients. We propose an order-constrained version of L1-regularized regression for this problem, and show how to solve it efficiently using the well-known Pool Adjacent Violators Algorithm as its proximal operator. The main application of this idea is time-lagged regression, where we predict an outcome at time t from features at the previous K time points. In this setting it is natural to assume that the coefficients decay as we move farther away from t, and hence the order constraint is reasonable. Potential applications include financial time series and prediction of dynamic patient out- comes based on clinical measurements. We illustrate this idea on real and simulated data.


Topological and Statistical Behavior Classifiers for Tracking Applications

arXiv.org Machine Learning

We introduce the first unified theory for target tracking using Multiple Hypothesis Tracking, Topological Data Analysis, and machine learning. Our string of innovations are 1) robust topological features are used to encode behavioral information, 2) statistical models are fitted to distributions over these topological features, and 3) the target type classification methods of Wigren and Bar Shalom et al. are employed to exploit the resulting likelihoods for topological features inside of the tracking procedure. To demonstrate the efficacy of our approach, we test our procedure on synthetic vehicular data generated by the Simulation of Urban Mobility package.


Inference of Sparse Networks with Unobserved Variables. Application to Gene Regulatory Networks

arXiv.org Machine Learning

Networks are a unifying framework for modeling complex systems and network inference problems are frequently encountered in many fields. Here, I develop and apply a generative approach to network inference (RCweb) for the case when the network is sparse and the latent (not observed) variables affect the observed ones. From all possible factor analysis (FA) decompositions explaining the variance in the data, RCweb selects the FA decomposition that is consistent with a sparse underlying network. The sparsity constraint is imposed by a novel method that significantly outperforms (in terms of accuracy, robustness to noise, complexity scaling, and computational efficiency) Bayesian methods and MLE methods using l1 norm relaxation such as K-SVD and l1--based sparse principle component analysis (PCA). Results from simulated models demonstrate that RCweb recovers exactly the model structures for sparsity as low (as non-sparse) as 50% and with ratio of unobserved to observed variables as high as 2. RCweb is robust to noise, with gradual decrease in the parameter ranges as the noise level increases.


Adaptive Reconfiguration Moves for Dirichlet Mixtures

arXiv.org Machine Learning

Bayesian mixture models are widely applied for unsupervised learning and exploratory data analysis. Markov chain Monte Carlo based on Gibbs sampling and split-merge moves are widely used for inference in these models. However, both methods are restricted to limited types of transitions and suffer from torpid mixing and low accept rates even for problems of modest size. We propose a method that considers a broader range of transitions that are close to equilibrium by exploiting multiple chains in parallel and using the past states adaptively to inform the proposal distribution. The method significantly improves on Gibbs and split-merge sampling as quantified using convergence diagnostics and acceptance rates. Adaptive MCMC methods which use past states to inform the proposal distribution has given rise to many ingenious sampling schemes for continuous problems and the present work can be seen as an important first step in bringing these benefits to partition-based problems.


Improved graph Laplacian via geometric self-consistency

arXiv.org Machine Learning

We address the problem of setting the kernel bandwidth used by Manifold Learning algorithms to construct the graph Laplacian. Exploiting the connection between manifold geometry, represented by the Riemannian metric, and the Laplace-Beltrami operator, we set the bandwidth by optimizing the Laplacian's ability to preserve the geometry of the data. Experiments show that this principled approach is effective and robust.


The Infinite Degree Corrected Stochastic Block Model

arXiv.org Machine Learning

In Stochastic blockmodels, which are among the most prominent statistical models for cluster analysis of complex networks, clusters are defined as groups of nodes with statistically similar link probabilities within and between groups. A recent extension by Karrer and Newman incorporates a node degree correction to model degree heterogeneity within each group. Although this demonstrably leads to better performance on several networks it is not obvious whether modelling node degree is always appropriate or necessary. We formulate the degree corrected stochastic blockmodel as a non-parametric Bayesian model, incorporating a parameter to control the amount of degree correction which can then be inferred from data. Additionally, our formulation yields principled ways of inferring the number of groups as well as predicting missing links in the network which can be used to quantify the model's predictive performance. On synthetic data we demonstrate that including the degree correction yields better performance both on recovering the true group structure and predicting missing links when degree heterogeneity is present, whereas performance is on par for data with no degree heterogeneity within clusters. On seven real networks (with no ground truth group structure available) we show that predictive performance is about equal whether or not degree correction is included; however, for some networks significantly fewer clusters are discovered when correcting for degree indicating that the data can be more compactly explained by clusters of heterogenous degree nodes.