Goto

Collaborating Authors

 Regression


The xyz algorithm for fast interaction search in high-dimensional data

arXiv.org Machine Learning

When performing regression on a dataset with $p$ variables, it is often of interest to go beyond using main linear effects and include interactions as products between individual variables. For small-scale problems, these interactions can be computed explicitly but this leads to a computational complexity of at least $\mathcal{O}(p^2)$ if done naively. This cost can be prohibitive if $p$ is very large. We introduce a new randomised algorithm that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in $p$. We show that strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires $\mathcal{O}(p^\alpha)$ operations for $1 < \alpha < 2$ depending on their strength. The underlying idea is to transform interaction search into a closestpair problem which can be solved efficiently in subquadratic time. The algorithm is called $\mathit{xyz}$ and is implemented in the language R. We demonstrate its efficiency for application to genome-wide association studies, where more than $10^{11}$ interactions can be screened in under $280$ seconds with a single-core $1.2$ GHz CPU.


Using TensorFlow for Predictive Analytics with Linear Regression

@machinelearnbot

Since its release in 2015 by the Google Brain team, TensorFlow has been a driving force in conversations centered on artificial intelligence, machine learning, and predictive analytics. With its flexible architecture, TensorFlow provides numerical computation capacity with incredible parallelism that is appealing to both small and large businesses. TensorFlow, being built on stateful dataflow graphs across multiple systems, allows for parallel processing--data to be leveraged in a meaningful way without requiring petabytes of data. To demonstrate how you can take advantage of TensorFlow without having huge silos of data on hand, I'll explain how to use TensorFlow to build a linear regression model in this post. Linear modeling is a relatively simplistic type of mathematical method that, when used properly, can help predict modeled behavior.


Comparison of Deepnet & Neuralnet

@machinelearnbot

Based on two R packages for neural networks. In this article, I compare two available R packages for using neural networks to model data: neuralnet and deepnet. Through the comparisons I highlight various challenges in finding good hyperparameter values. I show that some needed hyperparameters differ when using these two packages, even with the same underlying algorithmic approach. Both packages can be obtained via the R CRAN repository (see links at the end). I will focus on a simple time series example, composed of two predictors and the performance of the packages to predict future data after being trained on past data using a simple 5-neuron network. Note that most of what you read about in deep learning with neural networks are "classification" problems (more later); nonetheless such networks have promise for predicting continuous data including time series. Briefly, a neural network (also called a multilayer-perceptron etc.) is a connected network of neurons as shown here. An example neural network (generated using neuralnet). Note that except for the input layer (where the predictor values are fed in), the inputs to a neuron have weights specific to that neuron, so the output of a neuron is "re-used" as input to all neurons in the next layer, with unique weights. Before moving on to a brief description of how neural networks compute predictions, it is worth reflecting on the number of independent parameters in neural network models as compared to, for example, linear regression.


TensorFlow for Deep Learning: From Linear Regression to Reinforcement Learning: Bharath Ramsundar, Reza Bosagh Zadeh: 9781491980453: Amazon.com: Books

@machinelearnbot

Reza Bosagh Zadeh is Founder CEO at Matroid and Adjunct Professor at Stanford University. His work focuses on Machine Learning, Distributed Computing, and Discrete Applied Mathematics. Reza received his PhD in Computational Mathematics from Stanford University under the supervision of Gunnar Carlsson. His awards include a KDD Best Paper Award and the Gene Golub Outstanding Thesis Award. He has served on the Technical Advisory Boards of Microsoft and Databricks.


Graph-Sparse Logistic Regression

arXiv.org Machine Learning

We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package.


Top-down Transformation Choice

arXiv.org Machine Learning

Simple models are preferred over complex models, but over-simplistic models could lead to erroneous interpretations. The classical approach is to start with a simple model, whose shortcomings are assessed in residual-based model diagnostics. Eventually, one increases the complexity of this initial overly simple model and obtains a better-fitting model. I illustrate how transformation analysis can be used as an alternative approach to model choice. Instead of adding complexity to simple models, step-wise complexity reduction is used to help identify simpler and better-interpretable models. As an example, body mass index distributions in Switzerland are modelled by means of transformation models to understand the impact of sex, age, smoking and other lifestyle factors on a person's body mass index. In this process, I searched for a compromise between model fit and model interpretability. Special emphasis is given to the understanding of the connections between transformation models of increasing complexity. The models used in this analysis ranged from evergreens, such as the normal linear regression model with constant variance, to novel models with extremely flexible conditional distribution functions, such as transformation trees and transformation forests.


Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling

arXiv.org Machine Learning

Satellite imagery and remote sensing provide explanatory variables at relatively high resolutions for modeling geospatial phenomena, yet regional summaries are often desirable for analysis and actionable insight. In this paper, we propose a novel method of inducing spatial aggregations as a component of the machine learning process, yielding regional model features whose construction is driven by model prediction performance rather than prior assumptions. Our results demonstrate that Genetic Programming is particularly well suited to this type of feature construction because it can automatically synthesize appropriate aggregations, as well as better incorporate them into predictive models compared to other regression methods we tested. In our experiments we consider a specific problem instance and real-world dataset relevant to predicting snow properties in high-mountain Asia.


Adaptive regularization for Lasso models in the context of non-stationary data streams

arXiv.org Machine Learning

Large scale, streaming datasets are ubiquitous in modern machine learning. Streaming algorithms must be scalable, amenable to incremental training and robust to the presence of non-stationarity. In this work consider the problem of learning $\ell_1$ regularized linear models in the context of streaming data. In particular, the focus of this work revolves around how to select the regularization parameter when data arrives sequentially and the underlying distribution is non-stationary (implying the choice of optimal regularization parameter is itself time-varying). We propose a framework through which to infer an adaptive regularization parameter. Our approach employs an $\ell_1$ penalty constraint where the corresponding sparsity parameter is iteratively updated via stochastic gradient descent. This serves to reformulate the choice of regularization parameter in a principled framework for online learning. The proposed method is derived for linear regression and subsequently extended to generalized linear models. We validate our approach using simulated and real datasets and present an application to a neuroimaging dataset.


Beginners Guide to Regression Analysis and Plot Interpretations Tutorials & Notes Machine Learning HackerEarth

#artificialintelligence

"The road to machine learning starts with Regression. If you are aspiring to become a data scientist, regression is the first algorithm you need to learn master. Not just to clear job interviews, but to solve real world problems. Till today, a lot of consultancy firms continue to use regression techniques at a larger scale to help their clients. No doubt, it's one of the easiest algorithms to learn, but it requires persistent effort to get to the master level.


Predicting Station-level Hourly Demands in a Large-scale Bike-sharing Network: A Graph Convolutional Neural Network Approach

arXiv.org Machine Learning

Bike sharing is a vital piece in a modern multi-modal transportation system. However, it suffers from the bike unbalancing problem due to fluctuating spatial and temporal demands. Accurate bike sharing demand predictions can help operators to make optimal routes and schedules for bike redistributions, and therefore enhance the system efficiency. In this study, we propose a novel Graph Convolutional Neural Network with Data-driven Graph Filter (GCNN-DDGF) model to predict station-level hourly demands in a large-scale bike-sharing network. With each station as a vertex in the network, the new proposed GCNN-DDGF model is able to automatically learn the hidden correlations between stations, and thus overcomes a common issue reported in the previous studies, i.e., the quality and performance of GCNN models rely on the predefinition of the adjacency matrix. To show the performance of the proposed model, this study compares the GCNN-DDGF model with four GCNNs models, whose adjacency matrices are from different bike sharing system matrices including the Spatial Distance matrix (SD), the Demand matrix (DE), the Average Trip Duration matrix (ATD) and the Demand Correlation matrix (DC), respectively. The five types of GCNN models and the classic Support Vector Regression model are built on a Citi Bike dataset from New York City which includes 272 stations and over 28 million transactions from 2013 to 2016. Results show that the GCNN-DDGF model has the lowest Root Mean Square Error, followed by the GCNN-DC model, and the GCNN-ATD model has the worst performance. Through a further examination, we find the learned DDGF captures some similar information embedded in the SD, DE and DC matrices, and it also uncovers more hidden heterogeneous pairwise correlations between stations that are not revealed by any of those matrices.