Regression
How Outliers Can Pose a Problem in Linear Regression.
Linear Regression is without a doubt one of the most widely used machine algorithms because of the simple mathematics behind it and the ease with which it can be implemented. I have gone through in detail in some of my previous articles on how to make sure these assumptions are met and taken care of. In this article, I will be going over How Outliers can pose a serious problem for a Linear Regression model and how to detect them accordingly. Outliers are data points that fall far away from the major"cluster" of points. They can be legit data points carrying valuable information or can be erroneous values altogether.
A Blast From the Past: Personalizing Predictions of Video-Induced Emotions using Personal Memories as Context
Dudzik, Bernd, Broekens, Joost, Neerincx, Mark, Hung, Hayley
A key challenge in the accurate prediction of viewers' emotional responses to video stimuli in real-world applications is accounting for person- and situation-specific variation. An important contextual influence shaping individuals' subjective experience of a video is the personal memories that it triggers in them. Prior research has found that this memory influence explains more variation in video-induced emotions than other contextual variables commonly used for personalizing predictions, such as viewers' demographics or personality. In this article, we show that (1) automatic analysis of text describing their video-triggered memories can account for variation in viewers' emotional responses, and (2) that combining such an analysis with that of a video's audiovisual content enhances the accuracy of automatic predictions. We discuss the relevance of these findings for improving on state of the art approaches to automated affective video analysis in personalized contexts.
A Joint Network Optimization Framework to Predict Clinical Severity from Resting State Functional MRI Data
D'Souza, Niharika Shimona, Nebel, Mary Beth, Wymbs, Nicholas, Mostofsky, Stewart H., Venkataraman, Archana
We propose a novel optimization framework to predict clinical severity from resting state fMRI (rs-fMRI) data. Our model consists of two coupled terms. The first term decomposes the correlation matrices into a sparse set of representative subnetworks that define a network manifold. These subnetworks are modeled as rank-one outer-products which correspond to the elemental patterns of co-activation across the brain; the subnetworks are combined via patient-specific non-negative coefficients. The second term is a linear regression model that uses the patient-specific coefficients to predict a measure of clinical severity. We validate our framework on two separate datasets in a ten fold cross validation setting. The first is a cohort of fifty-eight patients diagnosed with Autism Spectrum Disorder (ASD). The second dataset consists of sixty three patients from a publicly available ASD database. Our method outperforms standard semi-supervised frameworks, which employ conventional graph theoretic and statistical representation learning techniques to relate the rs-fMRI correlations to behavior. In contrast, our joint network optimization framework exploits the structure of the rs-fMRI correlation matrices to simultaneously capture group level effects and patient heterogeneity. Finally, we demonstrate that our proposed framework robustly identifies clinically relevant networks characteristic of ASD.
Balanced dynamic multiple travelling salesmen: algorithms and continuous approximations
Dynamic routing occurs when customers are not known in advance, e.g. for real-time routing. Two heuristics are proposed that solve the balanced dynamic multiple travelling salesmen problem (BD-mTSP). These heuristics represent operational (tactical) tools for dynamic (online, real-time) routing. Several types and scopes of dynamics are proposed. Particular attention is given to sequential dynamics. The balanced dynamic closest vehicle heuristic (BD-CVH) and the balanced dynamic assignment vehicle heuristic (BD-AVH) are applied to this type of dynamics. The algorithms are tested for instances in the Euclidean plane. Continuous approximation models for the BD-mTSP's are derived and serve as strategic tools for dynamic routing. The models express route lengths using vehicles, customers and dynamic scopes without the need of running an algorithm. A machine learning approach was used to obtain regression models. The mean-average-percentage error of two of these models is below 3%.
Interpretable Machine Learning I
In this blog post, I am covering Christoph Molnar's Interpretable Machine Learning Book. During the reading process, I took copious notes, which are partly take-aways from the book for my personal Wiki and my personal notes on the topic. Using these notes as a template, this blog post turned into a rather strange hybrid between book review, remarks, and tutorial. It's by no means exhaustive, and I recommend reading the book itself if you want to learn about machine learning and interpretability. This has been going fairly well, having read 20 books in 2020 so far, that's been going pretty well.
Logistic Regression -- An Overview with an Example
Known for its simplicity to understand, the Logistic Regression algorithm is very reliable and extremely useful, and that's why when it comes to binary classification problems, The Logistic Regression is any engineers go-to choice. The Logistic Regression uses the sigmoid function to output continuous probabilistic values between 0–1 for any value from its independent variables, and these probabilistic values are then compared against a threshold value of 0.5. Any value greater than 0.5 is classified in the "1 category class," and any value less than 0.5 is classified in the "0 category class" or the "class in which the particular event does not take place." A common question people ask is that, If Logistic Regression is used for classification problems, why does it have the "Regression" term in it? And why can't we use Linear Regression instead of Logistic Regression for classification problems? The answer to the first question is that Even though, The Logistic Regression is used for binary classification problems, The output from the sigmoid equation is still a continuous numerical value.
Estimating Individual Treatment Effects with Time-Varying Confounders
Liu, Ruoqi, Yin, Changchang, Zhang, Ping
Estimating the individual treatment effect (ITE) from observational data is meaningful and practical in healthcare. Existing work mainly relies on the strong ignorability assumption that no hidden confounders exist, which may lead to bias in estimating causal effects. Some studies consider the hidden confounders are designed for static environment and not easily adaptable to a dynamic setting. In fact, most observational data (e.g., electronic medical records) is naturally dynamic and consists of sequential information. In this paper, we propose Deep Sequential Weighting (DSW) for estimating ITE with time-varying confounders. Specifically, DSW infers the hidden confounders by incorporating the current treatment assignments and historical information using a deep recurrent weighting neural network. The learned representations of hidden confounders combined with current observed data are leveraged for potential outcome and treatment predictions. We compute the time-varying inverse probabilities of treatment for re-weighting the population. We conduct comprehensive comparison experiments on fully-synthetic, semi-synthetic and real-world datasets to evaluate the performance of our model and baselines. Results demonstrate that our model can generate unbiased and accurate treatment effect by conditioning both time-varying observed and hidden confounders, paving the way for personalized medicine.
Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra
Numerous researchers recently applied empirical spectral analysis to the study of modern deep learning classifiers. We identify and discuss an important formal class/cross-class structure and show how it lies at the origin of the many visually striking features observed in deepnet spectra, some of which were reported in recent articles, others are unveiled here for the first time. These include spectral outliers, "spikes", and small but distinct continuous distributions, "bumps", often seen beyond the edge of a "main bulk". The significance of the cross-class structure is illustrated in three ways: (i) we prove the ratio of outliers to bulk in the spectrum of the Fisher information matrix is predictive of misclassification, in the context of multinomial logistic regression; (ii) we demonstrate how, gradually with depth, a network is able to separate class-distinctive information from class variability, all while orthogonalizing the class-distinctive information; and (iii) we propose a correction to KFAC, a well-known second-order optimization algorithm for training deepnets.
A general kernel boosting framework integrating pathways for predictive modeling based on genomic data
Zeng, Li, Yu, Zhaolong, Zhang, Yiliang, Zhao, Hongyu
Predictive modeling based on genomic data has gained popularity in biomedical research and clinical practice by allowing researchers and clinicians to identify biomarkers and tailor treatment decisions more efficiently. Analysis incorporating pathway information can boost discovery power and better connect new findings with biological mechanisms. In this article, we propose a general framework, Pathway-based Kernel Boosting (PKB), which incorporates clinical information and prior knowledge about pathways for prediction of binary, continuous and survival outcomes. We introduce appropriate loss functions and optimization procedures for different outcome types. Our prediction algorithm incorporates pathway knowledge by constructing kernel function spaces from the pathways and use them as base learners in the boosting procedure. Through extensive simulations and case studies in drug response and cancer survival datasets, we demonstrate that PKB can substantially outperform other competing methods, better identify biological pathways related to drug response and patient survival, and provide novel insights into cancer pathogenesis and treatment response.
The Reflectron: Exploiting geometry for learning generalized linear models
Boffi, Nicholas M., Slotine, Jean-Jacques E.
Generalized linear models (GLMs) extend linear regression by generating the dependent variables through a nonlinear function of a predictor in a Reproducing Kernel Hilbert Space. Despite nonconvexity of the underlying optimization problem, the GLM-tron algorithm of Kakade et al. (2011) provably learns GLMs with guarantees of computational and statistical efficiency. We present an extension of the GLM-tron to a mirror descent or natural gradient-like setting, which we call the Reflectron. The Reflectron enjoys the same statistical guarantees as the GLM-tron for any choice of potential function $\psi$. We show that $\psi$ can be used to exploit the underlying optimization geometry and improve statistical guarantees, or to define an optimization geometry and thereby implicitly regularize the model. The implicit bias of the algorithm can be used to impose advantageous -- such as sparsity-promoting -- priors on the learned weights. Our results extend to the case of multiple outputs with or without weight sharing, and we further show that the Reflectron can be used for online learning of GLMs in the realizable or bounded noise settings. We primarily perform our analysis in continuous-time, leading to simple derivations. We subsequently prove matching guarantees for a discrete implementation. We supplement our theoretical analysis with simulations on real and synthetic datasets demonstrating the validity of our theoretical results.