Regression
Heart Disease Prediction using Machine Learning
In this article, I will take you through how to train a model for the task of heart disease prediction using Machine Learning. I will use the Logistic Regression algorithm in machine learning to train a model to predict heart disease. Predicting and diagnosing heart disease is the biggest challenge in the medical industry and relies on factors such as the physical examination, symptoms and signs of the patient. Factors that influence heart disease are body cholesterol levels, smoking habit and obesity, family history of illnesses, blood pressure, and work environment. Machine learning algorithms play an essential and precise role in the prediction of heart disease.
Neuroimaging Feature Extraction using a Neural Network Classifier for Imaging Genetics
Beaulac, Cédric, Wu, Sidi, Gibson, Erin, Miranda, Michelle F., Cao, Jiguo, Rocha, Leno, Beg, Mirza Faisal, Nathoo, Farouk S.
A major issue in the association of genes to neuroimaging phenotypes is the high dimension of both genetic data and neuroimaging data. In this article, we tackle the latter problem with an eye toward developing solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neuroimaging data features that are relevant for predicting Alzheimer's Disease (AD) for subsequent relation to genetics. Our neuroimaging-genetic pipeline is comprised of image processing, neuroimaging feature extraction and genetic association steps. We propose a neural network classifier for extracting neuroimaging features that are related with disease and a multivariate Bayesian group sparse regression model for genetic association. We compare the predictive power of these features to expert selected features and take a closer look at the SNPs identified with the new neuroimaging features.
A State Transition Model for Mobile Notifications via Survival Analysis
Yuan, Yiping, Zhang, Jing, Chatterjee, Shaunak, Yu, Shipeng, Rosales, Romer
Mobile notifications have become a major communication channel for social networking services to keep users informed and engaged. As more mobile applications push notifications to users, they constantly face decisions on what to send, when and how. A lack of research and methodology commonly leads to heuristic decision making. Many notifications arrive at an inappropriate moment or introduce too many interruptions, failing to provide value to users and spurring users' complaints. In this paper we explore unique features of interactions between mobile notifications and user engagement. We propose a state transition framework to quantitatively evaluate the effectiveness of notifications. Within this framework, we develop a survival model for badging notifications assuming a log-linear structure and a Weibull distribution. Our results show that this model achieves more flexibility for applications and superior prediction accuracy than a logistic regression model. In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.
Random Forest Classifier: Basic Principles and Applications
Predicting customer behavior, consumer demand or stock price fluctuations, identifying fraud, and diagnosing patients -- these are some of the popular applications of the random forest (RF) algorithm. Used for classification and regression tasks, it can significantly enhance the efficiency of business processes and scientific research. This blog post will cover the random forest algorithm, its operating principles, capabilities and limitations, and real-world applications. A random forest is a supervised machine learning algorithm in which the calculations of numerous decision trees are combined to produce one final result. It's popular because it is simple yet effective. Random forest is an ensemble method -- a technique where we take many base-level models and combine them to get improved results.
10 Most Popular Types Of Machine Learning Algorithms
We live in quite an exciting time as we see technologies around us developing at a breakneck speed. We are also seeing data overtaking crude oil as the most valuable resource available for this generation's businesses. The transition of computing power from traditional on-premise mainframe data centres to easy-to-use and ever-scalable cloud computing has unlocked limitless possibilities to use data like never before. And we are just getting started to harness the true power of data. Data Science is probably the most famous buzzword in the domain of technology and IT right now. We are seeing the democratization of several tools and techniques working in tandem with the boost in computing. Using the data on almost everything, we can enable computers to learn and replicate actions as humans do.
An Approximation Method for Fitted Random Forests
Random Forests (RF) is a popular machine learning method for classification and regression problems. It involves a bagging application to decision tree models. One of the primary advantages of the Random Forests model is the reduction in the variance of the forecast. In large scale applications of the model with millions of data points and hundreds of features, the size of the fitted objects can get very large and reach the limits on the available space in production setups, depending on the number and depth of the trees. This could be especially challenging when trained models need to be downloaded on-demand to small devices with limited memory. There is a need to approximate the trained RF models to significantly reduce the model size without losing too much of prediction accuracy. In this project we study methods that approximate each fitted tree in the Random Forests model using the multinomial allocation of the data points to the leafs. Specifically, we begin by studying whether fitting a multinomial logistic regression (and subsequently, a generalized additive model (GAM) extension) to the output of each tree helps reduce the size while preserving the prediction quality.
Conditional Distribution Function Estimation Using Neural Networks for Censored and Uncensored Data
Most work in neural networks focuses on estimating the conditional mean of a continuous response variable given a set of covariates.In this article, we consider estimating the conditional distribution function using neural networks for both censored and uncensored data. The algorithm is built upon the data structure particularly constructed for the Cox regression with time-dependent covariates. Without imposing any model assumption, we consider a loss function that is based on the full likelihood where the conditional hazard function is the only unknown nonparametric parameter, for which unconstraint optimization methods can be applied. Through simulation studies, we show the proposed method possesses desirable performance, whereas the partial likelihood method and the traditional neural networks with $L_2$ loss yield biased estimates when model assumptions are violated. We further illustrate the proposed method with several real-world data sets. The implementation of the proposed methods is made available at https://github.com/bingqing0729/NNCDE.
Modular Conformal Calibration
Marx, Charles, Zhao, Shengjia, Neiswanger, Willie, Ermon, Stefano
Uncertainty estimates must be calibrated (i.e., accurate) and sharp (i.e., informative) in order to be useful. This has motivated a variety of methods for recalibration, which use held-out data to turn an uncalibrated model into a calibrated model. However, the applicability of existing methods is limited due to their assumption that the original model is also a probabilistic model. We introduce a versatile class of algorithms for recalibration in regression that we call Modular Conformal Calibration (MCC). This framework allows one to transform any regression model into a calibrated probabilistic model. The modular design of MCC allows us to make simple adjustments to existing algorithms that enable well-behaved distribution predictions. We also provide finite-sample calibration guarantees for MCC algorithms. Our framework recovers isotonic recalibration, conformal calibration, and conformal interval prediction, implying that our theoretical results apply to those methods as well. Finally, we conduct an empirical study of MCC on 17 regression datasets. Our results show that new algorithms designed in our framework achieve near-perfect calibration and improve sharpness relative to existing methods.
Linear Machine Learning Algorithms: An Overview - KDnuggets
Linear machine learning algorithms assume a linear relationship between the features and the target variable. In this article, we'll discuss several linear algorithms and their concepts. Here's a glimpse into what you can expect to learn: You can use linear algorithms for classification and regression problems. Let's start by looking at different algorithms and what problems they solve. Linear regression is arguably one of the oldest and most popular algorithms.