Goto

Collaborating Authors

 Regression


GLMs, CPUs, and GPUs: An introduction to machine learning through logistic regression, Python and…

#artificialintelligence

As a mentee in the ChiPy mentorship program I will be writing a few blog posts about my project -- which was to learn how to implement a couple machine learning algorithms for execution on the graphics card. In this blog post, I'll introduce a few concepts fundamental to machine learning using logistic regression as an example, as well as a code with simple implementation in Python and OpenCL interfaced with PyOpenCL. This post is intended for a broad audience; if you're completely new to machine learning it may be worthwhile to read the beginning and mess around with the code on your own. And if you have any feedback, good or bad, let me know! If you would like to see the code I'm using, you can take a look at my github repository. There are many cases where it feels natural to attempt to model the outcome of an event as the probability that the event occurs, as we often record the outcome of some event as binary data.


Denoising Linear Models with Permuted Data

arXiv.org Machine Learning

The multivariate linear regression model with shuffled data and additive Gaussian noise arises in various correspondence estimation and matching problems. Focusing on the denoising aspect of this problem, we provide a characterization the minimax error rate that is sharp up to logarithmic factors. We also analyze the performance of two versions of a computationally efficient estimator, and establish their consistency for a large range of input parameters. Finally, we provide an exact algorithm for the noiseless problem and demonstrate its performance on an image point-cloud matching task. Our analysis also extends to datasets with outliers.


On Prediction and Tolerance Intervals for Dynamic Treatment Regimes

arXiv.org Machine Learning

We develop and evaluate tolerance interval methods for dynamic treatment regimes (DTRs) that can provide more detailed prognostic information to patients who will follow an estimated optimal regime. Although the problem of constructing confidence intervals for DTRs has been extensively studied, prediction and tolerance intervals have received little attention. We begin by reviewing in detail different interval estimation and prediction methods and then adapting them to the DTR setting. We illustrate some of the challenges associated with tolerance interval estimation stemming from the fact that we do not typically have data that were generated from the estimated optimal regime. We give an extensive empirical evaluation of the methods and discussed several practical aspects of method choice, and we present an example application using data from a clinical trial. Finally, we discuss future directions within this important emerging area of DTR research.


Hybrid content-based and collaborative filtering recommendations with {ordinal} logistic regression (1): Feature engineering

@machinelearnbot

I will use {ordinal} clm() (and other cool R packages such as {text2vec} as well) here to develop a hybrid content-based, collaborative filtering, and (obivously) model-based approach to solve the recommendation problem on the MovieLens 100K dataset in R. All R code used in this project can be obtained from the respective GitHub repository; the chunks of code present in the body of the post illustrate the essential steps only. The MovieLens 100K dataset can be obtained from the GroupLens research laboratory of the Department of Computer Science and Engineering at the University of Minnesota. The first part of the study introduces the new approach and refers to the feature engineering steps that are performed by the OrdinalRecommenders_1.R script (found on GitHub). The second part, to be published soon, relies on the R code in OrdinalRecommenders_3.R and presents the model training, cross-validation, and analyses steps. The OrdinalRecommenders_2.R script encompasses some tireless for-looping in R (a bad habbit indeed) across the dataset only in order to place the information from the dataset in the format needed for the modeling phase.


The Building Blocks of AI – Hacker Noon

#artificialintelligence

A few weeks ago, I wrote about how and why I was learning Machine Learning, mainly through Andrew Ng's Coursera course. Machine Learning is built on prerequisites, so much so that learning by first principles seems overwhelming. Do you really need to spend a month learning linear algebra? You'll be okay if you have some math and programming experience. You really just have to be familiar with Sigma notation and be able to express it in a for loop. Sure, your assignments will take longer to complete and the first few times you see those giant equations your head will spin, but you can do this!


Feature selection algorithm based on Catastrophe model to improve the performance of regression analysis

arXiv.org Machine Learning

In this paper we introduce a new feature selection algorithm to remove the irrelevant or redundant features in the data sets. In this algorithm the importance of a feature is based on its fitting to the Catastrophe model. Akaike information crite- rion value is used for ranking the features in the data set. The proposed algorithm is compared with well-known RELIEF feature selection algorithm. Breast Cancer, Parkinson Telemonitoring data and Slice locality data sets are used to evaluate the model.


Changing Business Requirements In Demand Forecasting – Affineblog

#artificialintelligence

Affine recently completed 6 years, I have been a part of it for about 3 of those years. As an analytics firm, the most common business problem that we have come across is that of forecasting consumer demand. This is particularly true for Retail and CPG clients. Over the last few years have dealt with simple forecasting problems for which we can use very simple time-series forecasting techniques like ARIMA and ARIMAX or even linear regression these are forecasts which are more at an organization or for specific business divisions. But over the years we have seen a distinct shift in focus of all our clients to get forecasts at a more granular level, sometimes for even specific items.


Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling

arXiv.org Machine Learning

This paper describes a new neuroimaging analysis toolbox that allows for the modeling of nonlinear effects at the voxel level, overcoming limitations of methods based on linear models like the GLM. We illustrate its features using a relevant example in which distinct nonlinear trajectories of Alzheimer's disease related brain atrophy patterns were found across the full biological spectrum of the disease. The open-source toolbox is available in GitHub: https://github.com/


A study of Classification Problems using Logistic Regression and an insight to the admissions…

#artificialintelligence

In our world, many of the commonly encountered problems are classification problems. We are often confused between definite values or rigid choices of things. In this article, we will discuss about an algorithm used to solve simple classification problems effectively using Machine Learning. Also, we will analyze a hypothetical Binary Class problem involving Grad-School outcomes based on the Entrance Exam Marks and the Undergrad Marks. Supervised Learning is a machine learning technique in which we associate our inputs with our targets in the given dataset. We already have a definite intuition regarding our final output.


Customer Churn – Logistic Regression with R

@machinelearnbot

In the customer management lifecycle, customer churn refers to a decision made by the customer about ending the business relationship. It is also referred as loss of clients or customers. Customer loyalty and customer churn always add up to 100%. If a firm has a 60% of loyalty rate, then their loss or churn rate of customers is 40%. As per 80/20 customer profitability rule, 20% of customers are generating 80% of revenue.