Goto

Collaborating Authors

 Regression


A Survey of Machine Learning Applied to Computer Architecture Design

arXiv.org Artificial Intelligence

Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has explored broader applicability for design, optimization, and simulation. Notably, machine learning based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This paper reviews machine learning applied system-wide to simulation and run-time optimization, and in many individual components, including memory systems, branch predictors, networks-on-chip, and GPUs. The paper further analyzes current practice to highlight useful design strategies and identify areas for future work, based on optimized implementation strategies, opportune extensions to existing work, and ambitious long term possibilities. Taken together, these strategies and techniques present a promising future for increasingly automated architectural design.


Estimating covariance and precision matrices along subspaces

arXiv.org Machine Learning

We study the accuracy of estimating the covariance and the precision matrix of a $D$-variate sub-Gaussian distribution along a prescribed subspace or direction using the finite sample covariance with $N \geq D$ samples. Our results show that the estimation accuracy depends almost exclusively only on the components of the distribution that correspond to desired subspaces or directions. This is relevant for problems where behavior of data along a lower-dimensional space is of specific interest, such as dimension reduction or structured regression problems. As a by-product of the analysis, we reduce the effect the matrix condition number has on the estimation of precision matrices. Two applications are presented: direction-sensitive eigenspace perturbation bounds, and estimation of the single-index model. For the latter, a new estimator, derived from the analysis, with strong theoretical guarantees and superior numerical performance is proposed.


Function Preserving Projection for Scalable Exploration of High-Dimensional Data

arXiv.org Machine Learning

We present function preserving projections (FPP), a scalable linear projection technique for discovering interpretable relationships in high-dimensional data. Conventional dimension reduction methods aim to maximally preserve the global and/or local geometric structure of a dataset. However, in practice one is often more interested in determining how one or multiple user-selected response function(s) can be explained by the data. To intuitively connect the responses to the data, FPP constructs 2D linear embeddings optimized to reveal interpretable yet potentially non-linear patterns of the response functions. More specifically, FPP is designed to (i) produce human-interpretable embeddings; (ii) capture non-linear relationships; (iii) allow the simultaneous use of multiple response functions; and (iv) scale to millions of samples. Using FPP on real-world datasets, one can obtain fundamentally new insights about high-dimensional relationships in large-scale data that could not be achieved using existing dimension reduction methods.


bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)

arXiv.org Machine Learning

Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible model specifications are needed as big(ger) data become available, facilitated by more powerful computing infrastructure. (2) Full probabilistic modeling rather than predicting just means or expectations is crucial in many applications. (3) Interest in Bayesian inference has been increasing both as an appealing framework for regularizing or penalizing model estimation as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research in all three areas, also leading to associated software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available. For filling this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond). At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models (GAMs) or generalized additive models for location, scale, and shape (GAMLSS), also known as distributional regression. However, its building blocks are designed as "Lego bricks" encompassing various distributions (exponential family, Cox, joint models, ...), regression terms (linear, splines, random effects, tensor products, spatial fields, ...), and estimators (MCMC, backfitting, gradient boosting, lasso, ...). It is demonstrated how these can be easily recombined to make classical models more flexible or create new custom models for specific modeling challenges.


Scheduling optimization of parallel linear algebra algorithms using Supervised Learning

arXiv.org Machine Learning

Linear algebra algorithms are used widely in a variety of domains, e.g machine learning, numerical physics and video games graphics. For all these applications, loop-level parallelism is required to achieve high performance. However, finding the optimal way to schedule the workload between threads is a non-trivial problem because it depends on the structure of the algorithm being parallelized and the hardware the executable is run on. In the realm of Asynchronous Many Task runtime systems, a key aspect of the scheduling problem is predicting the proper chunk-size, where the chunk-size is defined as the number of iterations of a for-loop assigned to a thread as one task. In this paper, we study the applications of supervised learning models to predict the chunk-size which yields maximum performance on multiple parallel linear algebra operations using the HPX backend of Blaze's linear algebra library. More precisely, we generate our training and tests sets by measuring performance of the application with different chunk-sizes for multiple linear algebra operations; vector-addition, matrix-vector-multiplication, matrix-matrix addition and matrix-matrix-multiplication. We compare the use of logistic regression, neural networks and decision trees with a newly developed decision tree based model in order to predict the optimal value for chunk-size. Our results show that classical decision trees and our custom decision tree model are able to forecast a chunk-size which results in good performance for the linear algebra operations.


Let's Learn Tensorflow Linear Regression

#artificialintelligence

This is Algorunner coming at you with my very first YouTube video! In this video, we'll be going over some very basic functionality in the Tensorflow frame work. We build a Linear Regression Model optimized by Tensorflow's Gradient Decent optimizer to minimize the Mean Square Error. In this channel, I'll be focusing a lot on Artificial Intelligence theory and applications in the world of Machine Learning and Deep Learning. This is my first video so if you have any constructive criticism, please feel free to comment.


The column measure and Gradient-Free Gradient Boosting

arXiv.org Machine Learning

Sparse model selection by structural risk minimization leads to a set of a few predictors, ideally a subset of the true predictors. This selection clearly depends on the underlying loss function $\tilde L$. For linear regression with square loss, the particular (functional) Gradient Boosting variant $L_2-$Boosting excels for its computational efficiency even for very large predictor sets, while still providing suitable estimation consistency. For more general loss functions, functional gradients are not always easily accessible or, like in the case of continuous ranking, need not even exist. To close this gap, starting from column selection frequencies obtained from $L_2-$Boosting, we introduce a loss-dependent ''column measure'' $\nu^{(\tilde L)}$ which mathematically describes variable selection. The fact that certain variables relevant for a particular loss $\tilde L$ never get selected by $L_2-$Boosting is reflected by a respective singular part of $\nu^{(\tilde L)}$ w.r.t. $\nu^{(L_2)}$. With this concept at hand, it amounts to a suitable change of measure (accounting for singular parts) to make $L_2-$Boosting select variables according to a different loss $\tilde L$. As a consequence, this opens the bridge to applications of simulational techniques such as various resampling techniques, or rejection sampling, to achieve this change of measure in an algorithmic way.


Explaining Logistic Regression as Generalized Linear Model (in use as a classifier)

#artificialintelligence

The explanation of Logistic Regression as a Generalized Linear Model and use as a classifier is often confusing. In this article, I try to explain this idea from first principles. This blog is part of my forthcoming book on the Mathematical foundations of Data Science. Machine learning involves creating a model of a process. To create a model of a process, we need to identify patterns in data.


Machine Learning Optimization Algorithms & Portfolio Allocation

arXiv.org Machine Learning

Portfolio optimization emerged with the seminal paper of Markowitz (1952). The original mean-variance framework is appealing because it is very efficient from a computational point of view. However, it also has one well-established failing since it can lead to portfolios that are not optimal from a financial point of view. Nevertheless, very few models have succeeded in providing a real alternative solution to the Markowitz model. The main reason lies in the fact that most academic portfolio optimization models are intractable in real life although they present solid theoretical properties. By intractable we mean that they can be implemented for an investment universe with a small number of assets using a lot of computational resources and skills, but they are unable to manage a universe with dozens or hundreds of assets. However, the emergence and the rapid development of robo-advisors means that we need to rethink portfolio optimization and go beyond the traditional mean-variance optimization approach. Another industry has faced similar issues concerning large-scale optimization problems. Machine learning has long been associated with linear and logistic regression models. Again, the reason was the inability of optimization algorithms to solve high-dimensional industrial problems. Nevertheless, the end of the 1990s marked an important turning point with the development and the rediscovery of several methods that have since produced impressive results. The goal of this paper is to show how portfolio allocation can benefit from the development of these large-scale optimization algorithms. Not all of these algorithms are useful in our case, but four of them are essential when solving complex portfolio optimization problems. These four algorithms are the coordinate descent, the alternating direction method of multipliers, the proximal gradient method and the Dykstra's algorithm.


Trimmed Constrained Mixed Effects Models: Formulations and Algorithms

arXiv.org Machine Learning

Mixed effects (ME) models inform a vast array of problems in the physical and social sciences, and are pervasive in meta-analysis. We consider ME models where the random effects component is linear. We then develop an efficient approach for a broad problem class that allows nonlinear measurements, priors, and constraints, and finds robust estimates in all of these cases using trimming in the associated marginal likelihood. We illustrate the efficacy of the approach on a range of applications for meta-analysis of global health data. Constraints and priors are used to impose monotonicity, convexity and other characteristics on dose-response relationships, while nonlinear observations enable new epidemiological analyses in place of approximations. Robust extensions ensure that spurious studies do not drive our understanding of between-study heterogeneity. The software accompanying this paper is disseminated as an open-source Python package called limeTR.