Goto

Collaborating Authors

 Regression


log-sum-exp for logistic regression • /r/MachineLearning

@machinelearnbot

However, if the argument to exp(wT x) is large enough to cause overflow, wouldn't that also be the case for standard binary logistic regression as well, since negative-log-likelihood in that case contains the sigmoid function, which also has exp(wT x)? However, I don't think log-sum-exp can be applied to binary logistic regression, right?


Bayesian linear regression with Student-t assumptions

arXiv.org Machine Learning

As an automatic method of determining model complexity using the training data alone, Bayesian linear regression provides us a principled way to select hyperparameters. But one often needs approximation inference if distribution assumption is beyond Gaussian distribution. In this paper, we propose a Bayesian linear regression model with Student-t assumptions (BLRS), which can be inferred exactly. In this framework, both conjugate prior and expectation maximization (EM) algorithm are generalized. Meanwhile, we prove that the maximum likelihood solution is equivalent to the standard Bayesian linear regression with Gaussian assumptions (BLRG). The $q$-EM algorithm for BLRS is nearly identical to the EM algorithm for BLRG. It is showed that $q$-EM for BLRS can converge faster than EM for BLRG for the task of predicting online news popularity.


Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

arXiv.org Machine Learning

This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.


Deep Learning Lesson 3: Simple Networks and Code

#artificialintelligence

Let's get started with lesson three of our Practicing Deep Learning Series. So far our focus has been on a very simple network comprised of a single neuron. Though we've discussed its parts, we have neglected to show it actually doing anything. The focus of part three is to start diving into some actual code to illustrate the simple network we've discussed. We will spend a fair amount of time on the single neuron network so that you can get familiar with Keras while gaining an understanding of the basics of a simple network. As soon as this is complete, we will be moving onto multilayer networks, which are much more powerful than the simple networks below.


Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce

arXiv.org Machine Learning

In this paper, we propose a one-pass algorithm on MapReduce for penalized linear regression \[f_\lambda(\alpha, \beta) = \|Y - \alpha\mathbf{1} - X\beta\|_2^2 + p_{\lambda}(\beta)\] where $\alpha$ is the intercept which can be omitted depending on application; $\beta$ is the coefficients and $p_{\lambda}$ is the penalized function with penalizing parameter $\lambda$. $f_\lambda(\alpha, \beta)$ includes interesting classes such as Lasso, Ridge regression and Elastic-net. Compared to latest iterative distributed algorithms requiring multiple MapReduce jobs, our algorithm achieves huge performance improvement; moreover, our algorithm is exact compared to the approximate algorithms such as parallel stochastic gradient decent. Moreover, what our algorithm distinguishes with others is that it trains the model with cross validation to choose optimal $\lambda$ instead of user specified one. Key words: penalized linear regression, lasso, elastic-net, ridge, MapReduce


A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python)

#artificialintelligence

Tree based learning algorithms are considered to be one of the best and mostly used supervised learning methods. Tree based methods empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. They are adaptable at solving any kind of problem at hand (classification or regression). Methods like decision trees, random forest, gradient boosting are being popularly used in all kinds of data science problems. Hence, for every analyst (fresher also), it's important to learn these algorithms and use them for modeling. This tutorial is meant to help beginners learn tree based modeling from scratch. After the successful completion of this tutorial, one is expected to become proficient at using tree based algorithms and build predictive models. Note: This tutorial requires no prior knowledge of machine learning.


Predicting 30-Day Risk and Cost of "All-Cause" Hospital Readmissions

AAAI Conferences

The hospital readmission rate of patients within 30 days after discharge is broadly accepted as a healthcare quality measure and cost driver in the United States. The ability to estimate hospitalization costs alongside 30 day risk-stratification for such readmissions provides additional benefit for accountable care, now a global issue and foundation for the U.S.~government mandate under the Affordable Care Act. Recent data mining efforts either predict healthcare costs or risk of hospital readmission, but not both. In this paper we present a dual predictive modeling effort that utilizes healthcare data to predict the risk and cost of any hospital readmission (``all-cause''). For this purpose, we explore machine learning algorithms to do accurate predictions of healthcare costs and risk of 30-day readmission.Results on risk prediction for ``all-cause'' readmission compared to the standardized readmission tool (LACE) are promising, and the proposed techniques for cost prediction consistently outperform baseline models and demonstrate substantially lower mean absolute error (MAE).


Analyzing NIH Funding Patterns over Time with Statistical Text Analysis

AAAI Conferences

In the past few years various government funding organizations such as the U.S. National Institutes of Health and the U.S.\ National Science Foundation have provided access to large publicly-available online databases documenting the grants that they have funded over the past few decades. These databases provide an excellent opportunity for the application of statistical text analysis techniques to infer useful quantitative information about how funding patterns have changed over time. In this paper we analyze data from the National Cancer Institute (part of National Institutes of Health) and show how text classification techniques provide a useful starting point for analyzing how funding for cancer research has evolved over the past 20 years in the United States.


How long could it take to run a regression

#artificialintelligence

This afternoon, while I was discussing with Montserrat (aka @mguillen_estany) we were wondering how long it might take to run a regression model. More specifically, how long it might take if we use a Bayesian approach. My guess was that the time should probably be linear in, the number of observations. But I thought I would be good to check. Here the regression is a subset of smaller size.


District Data Labs - An Introduction to Machine Learning with Python

#artificialintelligence

For the mind does not require filling like a bottle, but rather, like wood, it only requires kindling to create in it an impulse to think independently and an ardent desire for the truth. The impulse to ingest more data is our first and most powerful instinct. Born with billions of neurons, as babies we begin developing complex synaptic networks by taking in massive amounts of data - sounds, smells, tastes, textures, pictures. It's not always graceful, but it is an effective way to learn. As data scientists, the trick is to encode similar learning instincts into applications, banking more on the volume of data that will flow through the system than on the elegance of the solution (see also these discussions of the Netflix prize and the "unreasonable effectiveness of data").