Regression
Penalized regression via the restricted bridge estimator
Yüzbaşı, Bahadır, Arashi, Mohammad, Akdeniz, Fikri
This article is concerned with the Bridge Regression, which is a special family in penalized regression with penalty function $\sum_{j=1}^{p}|\beta_j|^q$ with $q>0$, in a linear model with linear restrictions. The proposed restricted bridge (RBRIDGE) estimator simultaneously estimates parameters and selects important variables when a prior information about parameters are available in either low dimensional or high dimensional case. Using local quadratic approximation, the penalty term can be approximated around a local initial values vector and the RBRIDGE estimator enjoys a closed-form expression which can be solved when $q>0$. Special cases of our proposal are the restricted LASSO ($q=1$), restricted RIDGE ($q=2$), and restricted Elastic Net ($1< q < 2$) estimators. We provide some theoretical properties of the RBRIDGE estimator under for the low dimensional case, whereas the computational aspects are given for both low and high dimensional cases. An extensive Monte Carlo simulation study is conducted based on different prior pieces of information and the performance of the RBRIDGE estiamtor is compared with some competitive penalty estimators as well as the ORACLE. We also consider four real data examples analysis for comparison sake. The numerical results show that the suggested RBRIDGE estimator outperforms outstandingly when the prior is true or near exact
Analysis of an Automated Machine Learning Approach in Brain Predictive Modelling: A data-driven approach to Predict Brain Age from Cortical Anatomical Measures
Dafflon, Jessica, Pinaya, Walter H. L, Turkheimer, Federico, Cole, James H., Leech, Robert, Harris, Mathew A., Cox, Simon R., Whalley, Heather C., McIntosh, Andrew M., Hellyer, Peter J.
The use of machine learning (ML) algorithms has significantly increased in neuroscience. However, from the vast extent of possible ML algorithms, which one is the optimal model to predict the target variable? What are the hyperparameters for such a model? Given the plethora of possible answers to these questions, in the last years, automated machine learning (autoML) has been gaining attention. Here, we apply an autoML library called TPOT which uses a tree-based representation of machine learning pipelines and conducts a genetic-programming based approach to find the model and its hyperparameters that more closely predicts the subject's true age. To explore autoML and evaluate its efficacy within neuroimaging datasets, we chose a problem that has been the focus of previous extensive study: brain age prediction. Without any prior knowledge, TPOT was able to scan through the model space and create pipelines that outperformed the state-of-the-art accuracy for Freesurfer-based models using only thickness and volume information for anatomical structure. In particular, we compared the performance of TPOT (mean accuracy error (MAE): $4.612 \pm .124$ years) and a Relevance Vector Regression (MAE $5.474 \pm .140$ years). TPOT also suggested interesting combinations of models that do not match the current most used models for brain prediction but generalise well to unseen data. AutoML showed promising results as a data-driven approach to find optimal models for neuroimaging applications.
Deep Evidential Regression
Amini, Alexander, Schwarting, Wilko, Soleimany, Ava, Rus, Daniela
Deterministic neural networks (NNs) are increasingly being deployed in safety critical domains, where calibrated, robust and efficient measures of uncertainty are crucial. While it is possible to train regression networks to output the parameters of a probability distribution by maximizing a Gaussian likelihood function, the resulting model remains oblivious to the underlying confidence of its predictions. In this paper, we propose a novel method for training deterministic NNs to not only estimate the desired target but also the associated evidence in support of that target. We accomplish this by placing evidential priors over our original Gaussian likelihood function and training our NN to infer the hyperparameters of our evidential distribution. We impose priors during training such that the model is penalized when its predicted evidence is not aligned with the correct output. Thus the model estimates not only the probabilistic mean and variance of our target but also the underlying uncertainty associated with each of those parameters. We observe that our evidential regression method learns well-calibrated measures of uncertainty on various benchmarks, scales to complex computer vision tasks, and is robust to adversarial input perturbations.
How Do You Know You Have Enough Training Data?
There is some debate recently as to whether data is the new oil [1] or not [2]. Whatever the case, acquiring training data for our machine learning work can be expensive (in man-hours, licensing fees, equipment run time, etc.). Thus, a crucial issue in machine learning projects is to determine how much training data is needed to achieve a specific performance goal (i.e., classifier accuracy). In this post, we will do a quick but broad in scope review of empirical and research literature results, regarding training data size, in areas ranging from regression analysis to deep learning. The training data size issue is also known in the literature as sample complexity.
Logistic Regression from Scratch
During my journey as a Machine Learning (ML) practitioner, I found it has become ultimately easy for any human with limited knowledge on algorithms to take advantage of free python libraries such as scikit-learn to solve a ML problem. Truth be said, it's easy and sometimes no brainer to achieve this, as there are so many codes available in GitHub, Medium, Kaggle etc., You just need some amount of time looking at these codes to arrive at a solution to a problem of your choice. But, what if we learn every algorithm or procedures behind each machine learning pipeline that does all the heavy lifting for us inside these amazing libraries. In this blog post and the series of blog posts to come, I will be focusing on implementing machine learning algorithms from scratch using python and numpy. Sure you might argue with me for the first paragraph.
Mobile APP User Attribute Prediction by Heterogeneous Information Network Modeling
Zhang, Hekai, Gong, Jibing, Teng, Zhiyong, Wang, Dan, Wang, Hongfei, Du, Linfeng, Bhuiyan, Zakirul Alam
User-based attribute information, such as age and gender, is usually considered as user privacy information. It is difficult for enterprises to obtain user-based privacy attribute information. However, user-based privacy attribute information has a wide range of applications in personalized services, user behavior analysis and other aspects. this paper advances the HetPathMine model and puts forward TPathMine model. With applying the number of clicks of attributes under each node to express the user's emotional preference information, optimizations of the solution of meta-path weight are also presented. Based on meta-path in heterogeneous information networks, the new model integrates all relationships among objects into isomorphic relationships of classified objects. Matrix is used to realize the knowledge dissemination of category knowledge among isomorphic objects. The experimental results show that: (1) the prediction of user attributes based on heterogeneous information networks can achieve higher accuracy than traditional machine learning classification methods; (2) TPathMine model based on the number of clicks is more accurate in classifying users of different age groups, and the weight of each meta-path is consistent with human intuition or the real world situation.
Logistic Regressions and Rare Events
I previously worked on designing some problem sets for a PhD class. One of the assignments dealt with a simple classification problem using data that I took from a kaggle challenge trying to predict fraudulent credit card transactions. The goal of the problem is to predict the probability that a specific credit card transaction is fraudulent. One unforeseen issue with the data was that the unconditional probability that a single credit card transaction is fraudulent is very small. This type of data is known as rare events data, and is common in many areas such as disease detection, conflict prediction and, of course, fraud detection.
Differentially Private Survival Function Estimation
Survival function estimation is used in many disciplines, but it is most common in medical analytics in the form of the Kaplan-Meier estimator. Sensitive data (patient records) is used in the estimation without any explicit control on the information leakage, which is a significant privacy concern. We propose a first differentially private estimator of the survival function and show that it can be easily extended to provide differentially private confidence intervals and test statistics without spending any extra privacy budget. We further provide extensions for differentially private estimation of the competing risk cumulative incidence function. Using nine real-life clinical datasets, we provide empirical evidence that our proposed method provides good utility while simultaneously providing strong privacy guarantees.
Variable Selection with Random Survival Forest and Bayesian Additive Regression Tree for Survival Data
Saha, Satabdi, Ryu, Duchwan, Ebrahimi, Nader
In this paper we utilize a survival analysis methodology incorporating Bayesian additive regression trees to account for nonlinear and additive covariate effects. We compare the performance of Bayesian additive regression trees, Cox proportional hazards and random survival forests models for censored survival data, using simulation studies and survival analysis for breast cancer with U.S. SEER database for the year 2005. In simulation studies, we compare the three models across varying sample sizes and censoring rates on the basis of bias and prediction accuracy. In survival analysis for breast cancer, we retrospectively analyze a subset of 1500 patients having invasive ductal carcinoma that is a common form of breast cancer mostly affecting older woman. Predictive potential of the three models are then compared using some widely used performance assessment measures in survival literature.
On Tractable Computation of Expected Predictions
Khosravi, Pasha, Choi, YooJung, Liang, Yitao, Vergari, Antonio, Broeck, Guy Van den
Computing expected predictions has many interesting applications in areas such as fairness, handling missing values, and data analysis. Unfortunately, computing expectations of a discriminative model with respect to a probability distribution defined by an arbitrary generative model has been proven to be hard in general. In fact, the task is intractable even for simple models such as logistic regression and a naive Bayes distribution. In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations of the latter with respect to the former, as well as moments of any order, in case of regression. Specifically, we consider expressive probabilistic circuits with certain structural constraints that support tractable probabilistic inference. Moreover, we exploit the tractable computation of high-order moments to derive an algorithm to approximate the expectations, for classification scenarios in which exact computations are intractable. We evaluate the effectiveness of our exact and approximate algorithms in handling missing data during prediction time where they prove to be competitive to standard imputation techniques on a variety of datasets. Finally, we illustrate how expected prediction framework can be used to reason about the behaviour of discriminative models.