Regression
Build Better Machine Learning Models in Less Time with Transfer Learning
Our control model was a well established machine learning model using features that are known to work well. For text, the features are essentially normalized word counts (TF-IDF: term frequency / inverse document frequency vectors). For images, we use HOG features (histogram of oriented gradients). These features were fed into a logistic regression model for training and prediction. Our test model used custom collection; we fed data, trained a model, and made a prediction using transfer learning for text and image analysis under the covers.
How to Work Through a Regression Machine Learning Project in Weka Step-By-Step - Machine Learning Mastery
The fastest way to get good at applied machine learning is to practice on end-to-end projects. In this post you will discover how to work through a regression problem in Weka, end-to-end. Step-By-Step Regression Machine Learning Project Tutorial in Weka Photo by vagawi, some rights reserved. This tutorial will walk you through the key steps required to complete a machine learning project in Weka. Weka is the best platform for beginners getting started in applied machine learning.
Maximum Likelihood Estimate and Logistic Regression simplified
Least squares regression can cause impossible estimates such as probabilities that are less than zero and greater than 1.So, when the predicted value is measured as a probability, use Logistic Regression We use the log of the odds rather than the odds directly because an odds ratio cannot be a negative number--but its log can be negative. Notice that we have randomly initialized our coefficients for income and other predictors. These will be adjusted by Solver based on a likelihood function.We will cover them later Column H tells us the predicted probability of the borrower's actual behavior, whether that behavior is repayment or default--not simply, as in Column G, the predicted probability of defaulting on the loan. One property of logarithms is that their sum equals the logarithm of the product of the numbers on which they're based The logarithms of probabilities are always negative numbers, but the closer a probability is to 1.0, the closer its logarithm is to 0.0. I haven't covered cross-validation, which is commonly used to validate a logistic regression equation.If you don't always have a large number of cases to work with, a different approach is to use statistical inference.
Machine Learning Exercises In Python, Part 1
This post is part of a series covering the exercises from Andrew Ng's machine learning class on Coursera. The original code, exercise text, and data files for this post are available here. One of the pivotal moments in my professional development this year came when I discovered Coursera. I'd heard of the "MOOC" phenomenon but had not had the time to dive in and take a class. Earlier this year I finally pulled the trigger and signed up for Andrew Ng's Machine Learning class.
Dual Control for Approximate Bayesian Reinforcement Learning
Klenske, Edgar D., Hennig, Philipp
Control of non-episodic, finite-horizon dynamical systems with uncertain dynamics poses a tough and elementary case of the exploration-exploitation trade-off. Bayesian reinforcement learning, reasoning about the effect of actions and future observations, offers a principled solution, but is intractable. We review, then extend an old approximate approach from control theory---where the problem is known as dual control---in the context of modern regression methods, specifically generalized linear regression. Experiments on simulated systems show that this framework offers a useful approximation to the intractable aspects of Bayesian RL, producing structured exploration strategies that differ from standard RL approaches. We provide simple examples for the use of this framework in (approximate) Gaussian process regression and feedforward neural networks for the control of exploration.
Linear Regression with an Unknown Permutation: Statistical and Computational Limits
Pananjady, Ashwin, Wainwright, Martin J., Courtade, Thomas A.
Consider a noisy linear observation model with an unknown permutation, based on observing $y = \Pi^* A x^* + w$, where $x^* \in \mathbb{R}^d$ is an unknown vector, $\Pi^*$ is an unknown $n \times n$ permutation matrix, and $w \in \mathbb{R}^n$ is additive Gaussian noise. We analyze the problem of permutation recovery in a random design setting in which the entries of the matrix $A$ are drawn i.i.d. from a standard Gaussian distribution, and establish sharp conditions on the SNR, sample size $n$, and dimension $d$ under which $\Pi^*$ is exactly and approximately recoverable. On the computational front, we show that the maximum likelihood estimate of $\Pi^*$ is NP-hard to compute, while also providing a polynomial time algorithm when $d =1$.
Robust High-Dimensional Linear Regression
Liu, Chang, Li, Bo, Vorobeychik, Yevgeniy, Oprea, Alina
The effectiveness of supervised learning techniques has made them ubiquitous in research and practice. In high-dimensional settings, supervised learning commonly relies on dimensionality reduction to improve performance and identify the most important factors in predicting outcomes. However, the economic importance of learning has made it a natural target for adversarial manipulation of training data, which we term poisoning attacks. Prior approaches to dealing with robust supervised learning rely on strong assumptions about the nature of the feature matrix, such as feature independence and sub-Gaussian noise with low variance. We propose an integrated method for robust regression that relaxes these assumptions, assuming only that the feature matrix can be well approximated by a low-rank matrix. Our techniques integrate improved robust low-rank matrix approximation and robust principle component regression, and yield strong performance guarantees. Moreover, we experimentally show that our methods significantly outperform state of the art both in running time and prediction error.
Classification with the pot-pot plot
Pokotylo, Oleksii, Mosler, Karl
We propose a procedure for supervised classification that is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class's prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials. Separation of the classes, as well as classification of new data points, is performed on this plot. For this, either the $\alpha$-procedure ($\alpha$-P) or $k$-nearest neighbors ($k$-NN) are employed. For data that are generated from continuous distributions, these classifiers prove to be strongly Bayes-consistent. The potentials depend on the kernel and its bandwidth used in the density estimate. We investigate several variants of bandwidth selection, including joint and separate pre-scaling and a bandwidth regression approach. The new method is applied to benchmark data from the literature, including simulated data sets as well as 50 sets of real data. It compares favorably to known classification methods such as LDA, QDA, max kernel density estimates, $k$-NN, and $DD$-plot classification using depth functions.
Sampling Requirements and Accelerated Schemes for Sparse Linear Regression with Orthogonal Least-Squares
Hashemi, Abolfazl, Vikalo, Haris
The Orthogonal Least Squares (OLS) algorithm sequentially selects columns of the coefficient matrix to greedily find an approximate sparse solution to an underdetermined system of linear equations. Previous work on the analysis of OLS has been limited; in particular, there exist no guarantees on the performance of OLS for sparse linear regression from random measurements. In this paper, the problem of inferring a sparse vector from random linear combinations of its components using OLS is studied. For the noiseless scenario, it is shown that when the entries of a coefficient matrix are samples from a Gaussian or a Bernoulli distribution, OLS with high probability recovers a $k$-sparse $m$-dimensional sparse vector using ${\cal O}\left(k\log m\right)$ measurements. Similar result is established for the bounded-noise scenario where an additional condition on the smallest nonzero element of the unknown vector is required. Moreover, generalizations that reduce computational complexity of OLS and thus extend its practical feasibility are proposed. The generalized OLS algorithm is empirically shown to outperform broadly used existing algorithms in terms of accuracy, running time, or both.
Bayesian Kernel and Mutual $k$-Nearest Neighbor Regression
We propose Bayesian extensions of two nonparametric regression methods which are kernel and mutual $k$-nearest neighbor regression methods. Derived based on Gaussian process models for regression, the extensions provide distributions for target value estimates and the framework to select the hyperparameters. It is shown that both the proposed methods asymptotically converge to kernel and mutual $k$-nearest neighbor regression methods, respectively. The simulation results show that the proposed methods can select proper hyperparameters and are better than or comparable to the former methods for an artificial data set and a real world data set.