Goto

Collaborating Authors

 Regression


Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling

arXiv.org Artificial Intelligence

Using logical clauses to represent patterns, Tsetlin machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed on several benchmarks. A team of Tsetlin automata (TAs) composes each clause, thus driving the entire learning process. These are rewarded/penalized according to three local rules that optimize global behaviour. Each clause votes for or against a particular class, with classification resolved using a majority vote. In the parallel and asynchronous architecture that we propose here, every clause runs in its own thread for massive parallelism. For each training example, we keep track of the class votes obtained from the clauses in local voting tallies. The local voting tallies allow us to detach the processing of each clause from the rest of the clauses, supporting decentralized learning. Thus, rather than processing training examples one-by-one as in the original TM, the clauses access the training examples simultaneously, updating themselves and the local voting tallies in parallel. There is no synchronization among the clause threads, apart from atomic adds to the local voting tallies. Operating asynchronously, each team of TA will most of the time operate on partially calculated or outdated voting tallies. However, across diverse learning tasks, it turns out that our decentralized TM learning algorithm copes well with working on outdated data, resulting in no significant loss in learning accuracy. Further, we show that the approach provides up to 50 times faster learning. Finally, learning time is almost constant for reasonable clause amounts. For sufficiently large clause numbers, computation time increases approximately proportionally. Our parallel and asynchronous architecture thus allows processing of more massive datasets and operating with more clauses for higher accuracy.


Machine Learning's Dropout Training is Distributionally Robust Optimal

arXiv.org Machine Learning

This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician's covariates using a multiplicative nonparametric errors-in-variables model. In this game---known as a Distributionally Robust Optimization problem---nature's least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some fixed probability $\delta$. Our decision-theoretic analysis shows that dropout training---the statistician's minimax strategy in the game---indeed provides out-of-sample expected loss guarantees for distributions that arise from multiplicative perturbations of in-sample data. This paper also provides a novel, parallelizable, Unbiased Multi-Level Monte Carlo algorithm to speed-up the implementation of dropout training. Our algorithm has a much smaller computational cost compared to the naive implementation of dropout, provided the number of data points is much smaller than the dimension of the covariate vector.


Random boosting and random^2 forests -- A random tree depth injection approach

arXiv.org Machine Learning

The induction of additional randomness in parallel and sequential ensemble methods has proven to be worthwhile in many aspects. In this manuscript, we propose and examine a novel random tree depth injection approach suitable for sequential and parallel tree-based approaches including Boosting and Random Forests. The resulting methods are called \emph{Random Boost} and \emph{Random$^2$ Forest}. Both approaches serve as valuable extensions to the existing literature on the gradient boosting framework and random forests. A Monte Carlo simulation, in which tree-shaped data sets with different numbers of final partitions are built, suggests that there are several scenarios where \emph{Random Boost} and \emph{Random$^2$ Forest} can improve the prediction performance of conventional hierarchical boosting and random forest approaches. The new algorithms appear to be especially successful in cases where there are merely a few high-order interactions in the generated data. In addition, our simulations suggest that our random tree depth injection approach can improve computation time by up to 40%, while at the same time the performance losses in terms of prediction accuracy turn out to be minor or even negligible in most cases.


AutoCP: Automated Pipelines for Accurate Prediction Intervals

arXiv.org Machine Learning

Successful application of machine learning models to real-world prediction problems, e.g. financial forecasting and personalized medicine, has proved to be challenging, because such settings require limiting and quantifying the uncertainty in the model predictions, i.e. providing valid and accurate prediction intervals. Conformal Prediction is a distribution-free approach to construct valid prediction intervals in finite samples. However, the prediction intervals constructed by Conformal Prediction are often (because of over-fitting, inappropriate measures of nonconformity, or other issues) overly conservative and hence inadequate for the application(s) at hand. This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP). Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate while optimizing the interval length to be accurate and less conservative. We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.


Machine Learning Regression Masterclass in Python

#artificialintelligence

Artificial Intelligence (AI) revolution is here! The technology is progressing at a massive scale and is being widely adopted in the Healthcare, defense, banking, gaming, transportation and robotics industries. Machine Learning is a subfield of Artificial Intelligence that enables machines to improve at a given task with experience. Machine Learning is an extremely hot topic; the demand for experienced machine learning engineers and data scientists has been steadily growing in the past 5 years. According to a report released by Research and Markets, the global AI and machine learning technology sectors are expected to grow from $1.4B to $8.8B by 2022 and it is predicted that AI tech sector will create around 2.3 million jobs by 2020.


Types of Regression in Statistics Along with Their Formulas - Statanalytica

#artificialintelligence

There are different types of regression in statistics, but before proceeding to the details of them. Let's get some information on what is a statistical regression? Regression is the branch of the statistical subject that plays an important role in predicting the analytical data. It is also used to calculate the connection between the dependent variables with single or more predictor variables. The main objective of the regression is to fit the given data in a way that they must exist in minimum outliers.


How to implement Linear Regression with TensorFlow

#artificialintelligence

Now, let's jump to the implementation. Firstly, we need to, obviously, import some libraries. We import tensorflow as it is the main thing we use for the implementation, matplotlib for visualizing our results, make_regression function, from sklearn, which we will be using to generate a regression dataset for using as an example, and the python's built-in math module. The first thing we do inside .fit() is to concatenate an extra column of 1's to our input matrix X. This is to simplify our math and treat the bias as the weight of an extra variable that's always 1.


Gaining an Intuition for Neural Networks

#artificialintelligence

Neural Networks are quite confusing, especially when you are programming them from scratch. This article will give you an intuition for the relevant concepts necessary for the neural network to work, by creating and training a univariate regression model, with one synapse, between the input and output. Imagine you are a doctor, and you want to find a quantitative relationship between height and weight: That is, you want to be able to reliably predict the weight of someone, given their height. So far, you have data on 5 people's height and weight. When we look at this graph, we can visualize a line that best explains the data, but we want to quantify this.


6 Common Mistakes in Data Science and How To Avoid Them - KDnuggets

#artificialintelligence

In data science or machine learning, we use data for descriptive analytics to draw out meaningful conclusions from the data, or we can use data for predictive purposes to build models that can make predictions on unseen data. The reliability of any model depends on the level of expertise of the data scientist. It is one thing to build a machine learning model. It is another thing to ensure the model is optimal and of the highest quality. This article will discuss six common mistakes that can adversely influence the quality or predictive power of a machine learning model with several case studies included.


Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

arXiv.org Artificial Intelligence

A common technique for compressing a neural network is to compute the $k$-rank $\ell_2$ approximation $A_{k,2}$ of the matrix $A\in\mathbb{R}^{n\times d}$ that corresponds to a fully connected layer (or embedding layer). Here, $d$ is the number of the neurons in the layer, $n$ is the number in the next one, and $A_{k,2}$ can be stored in $O((n+d)k)$ memory instead of $O(nd)$. This $\ell_2$-approximation minimizes the sum over every entry to the power of $p=2$ in the matrix $A - A_{k,2}$, among every matrix $A_{k,2}\in\mathbb{R}^{n\times d}$ whose rank is $k$. While it can be computed efficiently via SVD, the $\ell_2$-approximation is known to be very sensitive to outliers ("far-away" rows). Hence, machine learning uses e.g. Lasso Regression, $\ell_1$-regularization, and $\ell_1$-SVM that use the $\ell_1$-norm. This paper suggests to replace the $k$-rank $\ell_2$ approximation by $\ell_p$, for $p\in [1,2]$. We then provide practical and provable approximation algorithms to compute it for any $p\geq1$, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage. For example, our approach achieves $28\%$ compression of RoBERTa's embedding layer with only $0.63\%$ additive drop in the accuracy (without fine-tuning) in average over all tasks in GLUE, compared to $11\%$ drop using the existing $\ell_2$-approximation. Open code is provided for reproducing and extending our results.