Regression
Machine learning algorithms explained
Machine learning and deep learning have been widely embraced, and even more widely misunderstood. In this article, I'd like to step back and explain both machine learning and deep learning in basic terms, discuss some of the most common machine learning algorithms, and explain how those algorithms relate to the other pieces of the puzzle of creating predictive models from historical data. Recall that machine learning is a class of methods for automatically creating models from data. Machine learning algorithms are the engines of machine learning, meaning it is the algorithms that turn a data set into a model. Which kind of algorithm works best (supervised, unsupervised, classification, regression, etc.) depends on the kind of problem you're solving, the computing resources available, and the nature of the data.
On approximating dropout noise injection
This paper examines the assumptions of the derived equivalence between dropout noise injection and $L_2$ regularisation for logistic regression with negative log loss. We show that the approximation method is based on a divergent Taylor expansion, making, subsequent work using this approximation to compare the dropout trained logistic regression model with standard regularisers unfortunately ill-founded to date. Moreover, the approximation approach is shown to be invalid using any robust constraints. We show how this finding extends to general neural network topologies that use a cross-entropy prediction layer.
Active Learning for Binary Classification with Abstention
Shekhar, Shubhanshu, Ghavamzadeh, Mohammad, Javidi, Tara
We construct and analyze active learning algorithms for the problem of binary classification with abstention. We consider three abstention settings: \emph{fixed-cost} and two variants of \emph{bounded-rate} abstention, and for each of them propose an active learning algorithm. All the proposed algorithms can work in the most commonly used active learning models, i.e., \emph{membership-query}, \emph{pool-based}, and \emph{stream-based} sampling. We obtain upper-bounds on the excess risk of our algorithms in a general non-parametric framework and establish their minimax near-optimality by deriving matching lower-bounds. Since our algorithms rely on the knowledge of some smoothness parameters of the regression function, we then describe a new strategy to adapt to these unknown parameters in a data-driven manner. Since the worst case computational complexity of our proposed algorithms increases exponentially with the dimension of the input space, we conclude the paper with a computationally efficient variant of our algorithm whose computational complexity has a polynomial dependence over a smaller but rich class of learning problems.
Kernel Instrumental Variable Regression
Singh, Rahul, Sahani, Maneesh, Gretton, Arthur
Instrumental variable regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which the convergence rate achieves the minimax optimal rate for unconfounded, one-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm's first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric instrumental variable regression.
Robust approximate linear regression without correspondence
Varol, Erdem, Nejatbakhsh, Amin
Estimating regression coefficients using unordered multisets of covariates and responses has been introduced as the regression without correspondence problem. Previous theoretical analysis of the problem has been done in a setting where the responses are a permutation of the regressed covariates. This paper expands the setting by analyzing the problem where they may be missing correspondences and outliers in addition to a permutation action. We term this problem robust regression without correspondence and provide several algorithms for exact and approximate recovery in a noiseless and noisy one-dimensional setting as well as an approximation algorithm for multiple dimensions. The theoretical guarantees of the algorithms are verified in simulation data. We also demonstrate a neuroscience application by obtaining robust point set matchings of the neurons of the model organism Caenorhabditis elegans.
Ordinal Regression as Structured Classification
Twomey, Niall, Poyiadzi, Rafael, Mann, Callum, Santos-Rodríguez, Raúl
This paper extends the class of ordinal regression models with a structured interpretation of the problem by applying a novel treatment of encoded labels. The net effect of this is to transform the underlying problem from an ordinal regression task to a (structured) classification task which we solve with conditional random fields, thereby achieving a coherent and probabilistic model in which all model parameters are jointly learnt. Importantly, we show that although we have cast ordinal regression to classification, our method still fall within the class of decomposition methods in the ordinal regression ontology. This is an important link since our experience is that many applications of machine learning to healthcare ignores completely the important nature of the label ordering, and hence these approaches should considered naive in this ontology. We also show that our model is flexible both in how it adapts to data manifolds and in terms of the operations that are available for practitioner to execute. Our empirical evaluation demonstrates that the proposed approach overwhelmingly produces superior and often statistically significant results over baseline approaches on forty popular ordinal regression models, and demonstrate that the proposed model significantly out-performs baselines on synthetic and real datasets. Our implementation, together with scripts to reproduce the results of this work, will be available on a public GitHub repository.
High Dimensional Classification via Empirical Risk Minimization: Improvements and Optimality
In this article, we investigate a family of classification algorithms defined by the principle of empirical risk minimization, in the high dimensional regime where the feature dimension $p$ and data number $n$ are both large and comparable. Based on recent advances in high dimensional statistics and random matrix theory, we provide under mixture data model a unified stochastic characterization of classifiers learned with different loss functions. Our results are instrumental to an in-depth understanding as well as practical improvements on this fundamental classification approach. As the main outcome, we demonstrate the existence of a universally optimal loss function which yields the best high dimensional performance at any given $n/p$ ratio.
RKHSMetaMod : An R package to estimate the Hoeffding decomposition of an unknown function by solving RKHS Ridge Group Sparse optimization problem
Kamari, Halaleh, Huet, Sylvie, Taupin, Marie-Luce
In the context of the Gaussian regression model, RKHSMetaMod estimates a meta model by solving the Ridge Group Sparse Optimization Problem based on the Reproducing Kernel Hilbert Spaces (RKHS). The estimated meta model is an additive model that satisfies the properties of the Hoeffding decomposition, and its terms estimate the terms in the Hoeffding decomposition of the unkwown regression function. This package provides an interface from R statistical computing environment to the C++ libraries Eigen and GSL. It uses the efficient C++ functions through RcppEigen and RcppGSL packages to speed up the execution time and the R environment in order to propose an user friendly package.
Commonsense Properties from Query Logs and Question Answering Forums
Romero, Julien, Razniewski, Simon, Pal, Koninika, Pan, Jeff Z., Sakhadeo, Archit, Weikum, Gerhard
Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.
Learning Networked Exponential Families with Network Lasso
The data arising in many important big-data applications, ranging from social networks to network medicine, consist of high-dimensional data points related by an intrinsic (complex) network structure. In order to jointly leverage the information conveyed in the network structure as well as the statistical power contained in high-dimensional data points, we propose networked exponential families. We apply the network Lasso to learn networked exponential families as a probabilistic model for heterogeneous datasets with intrinsic network structure. In order to allow for accurate learning from high-dimensional data, we borrow statistical strength, via the intrinsic network structure, across the dataset. The resulting method aims at regularized empirical risk minimization using the total variation of the model parameters as regularizer. This minimization problem is a non-smooth convex optimization problem which we solve using a primal-dual splitting method. This method is appealing for big data applications as it can be implemented as a highly scalable message passing algorithm.