Regression
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond
Hu, Yuzheng, Cai, Tianle, Shan, Jinyong, Tang, Shange, Cai, Chaochao, Song, Ethan, Li, Bo, Song, Dawn
We consider vertical logistic regression (VLR) trained with mini-batch gradient descent -- a setting which has attracted growing interest among industries and proven to be useful in a wide range of applications including finance and medical research. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared. We first consider the honest-but-curious threat model, in which the detailed implementation of protocol is neglected and only the shared procedure is assumed, which we abstract as an oracle. We find that even under this general setting, single-dimension feature and label can still be recovered from the other party under suitable constraints of batch size, thus demonstrating the potential vulnerability of all frameworks following the same philosophy. Then we look into a popular instantiation of the protocol based on Homomorphic Encryption (HE). We propose an active attack that significantly weaken the constraints on batch size in the previous analysis via generating and compressing auxiliary ciphertext. To address the privacy leakage within the HE-based protocol, we develop a simple-yet-effective countermeasure based on Differential Privacy (DP), and provide both utility and privacy guarantees for the updated algorithm. Finally, we empirically verify the effectiveness of our attack and defense on benchmark datasets. Altogether, our findings suggest that all vertical federated learning frameworks that solely depend on HE might contain severe privacy risks, and DP, which has already demonstrated its power in horizontal federated learning, can also play a crucial role in the vertical setting, especially when coupled with HE or secure multi-party computation (MPC) techniques.
Heterogeneous Treatment Effect with Trained Kernels of the Nadaraya-Watson Regression
Konstantinov, Andrei V., Kirpichenko, Stanislav R., Utkin, Lev V.
The efficient treatment for a patient with her/his clinical and other characteristics [1, 2] can be regarded as an important goal of the real personalized medicine. The goal can be achieved by means of the machine learning methods due to the increasing amount of available electronic health records which are a basis for developing accurate models. To estimate the treatment effect, patients are divided into two groups called treatment and control, and then patients from the different groups are compared. One of the popular measures of the efficient treatment used in machine learning models is the average treatment effect (ATE) [3], which is estimated on the basis of observed data about patients as the mean difference between outcomes of patients from the treatment and control groups. Due to the difference between the patients characteristics and the difference between their responses to a particular treatment, the treatment effect is measured by the conditional average treatment effects (CATE) or the heterogeneous treatment effect (HTE) defined as ATE conditional on a patient feature vector [4, 5, 6, 7]. Two main problems can be pointed out when CATE is estimated. The first one is that the control group is usually larger than the treatment group. As a result, we meet the problem of a small training dataset, which does not allow us to apply directly many efficient machine learning methods.
Error-in-variables modelling for operator learning
Patel, Ravi G., Manickam, Indu, Lee, Myoungkyu, Gulian, Mamikon
Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the mapping between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typically exhibit noise in the independent variables as well, since both variables represent signals that are subject to measurement error. In regression on scalar data, failure to account for noisy independent variables can lead to biased parameter estimates. With noisy independent variables, linear models fitted via ordinary least squares (OLS) will show attenuation bias, wherein the slope will be underestimated. In this work, we derive an analogue of attenuation bias for linear operator regression with white noise in both the independent and dependent variables. In the nonlinear setting, we computationally demonstrate underprediction of the action of the Burgers operator in the presence of noise in the independent variable. We propose error-in-variables (EiV) models for two operator regression methods, MOR-Physics and DeepONet, and demonstrate that these new models reduce bias in the presence of noisy independent variables for a variety of operator learning problems. Considering the Burgers operator in 1D and 2D, we demonstrate that EiV operator learning robustly recovers operators in high-noise regimes that defeat OLS operator learning. We also introduce an EiV model for time-evolving PDE discovery and show that OLS and EiV perform similarly in learning the Kuramoto-Sivashinsky evolution operator from corrupted data, suggesting that the effect of bias in OLS operator learning depends on the regularity of the target operator.
Machine Learning #5 -- Linear Classifiers, Logistic Regression, Regularization
In the third of our article series, we touched on the topic of Linear Regression. We then used this method to create a house price estimator. The concepts of Linear Regression and Logistic Regression should not be confused with each other. Logistic Regression is a Linear Classifier. A statistical method used to analyze a dataset with one or more independent variables that determine a class. The outcome is measured with a binary variable (there are only two possible outcomes).
Machine Learning Algorithms Explained in Less Than 1 Minute Each - KDnuggets
This article will explain some of the most well known machine learning algorithms in less than a minute - helping everyone to understand them! One of the simplest Machine learning algorithms out there, Linear Regression is used to make predictions on continuous dependent variables with knowledge from independent variables. A dependent variable is the effect, in which its value depends on changes in the independent variable. You may remember the line of best fit from school - this is what Linear Regression produces. A simple example is predicting one's weight depending on their height.
pGMM Kernel Regression and Comparisons with Boosted Trees
In this work, we demonstrate the advantage of the pGMM (``powered generalized min-max'') kernel in the context of (ridge) regression. In recent prior studies, the pGMM kernel has been extensively evaluated for classification tasks, for logistic regression, support vector machines, as well as deep neural networks. In this paper, we provide an experimental study on ridge regression, to compare the pGMM kernel regression with the ordinary ridge linear regression as well as the RBF kernel ridge regression. Perhaps surprisingly, even without a tuning parameter (i.e., $p=1$ for the power parameter of the pGMM kernel), the pGMM kernel already performs well. Furthermore, by tuning the parameter $p$, this (deceptively simple) pGMM kernel even performs quite comparably to boosted trees. Boosting and boosted trees are very popular in machine learning practice. For regression tasks, typically, practitioners use $L_2$ boost, i.e., for minimizing the $L_2$ loss. Sometimes for the purpose of robustness, the $L_1$ boost might be a choice. In this study, we implement $L_p$ boost for $p\geq 1$ and include it in the package of ``Fast ABC-Boost''. Perhaps also surprisingly, the best performance (in terms of $L_2$ regression loss) is often attained at $p>2$, in some cases at $p\gg 2$. This phenomenon has already been demonstrated by Li et al (UAI 2010) in the context of k-nearest neighbor classification using $L_p$ distances. In summary, the implementation of $L_p$ boost provides practitioners the additional flexibility of tuning boosting algorithms for potentially achieving better accuracy in regression applications.
A Simple Test-Time Method for Out-of-Distribution Detection
Fan, Ke, Wang, Yikai, Yu, Qian, Li, Da, Fu, Yanwei
Neural networks are known to produce over-confident predictions on input images, even when these images are out-of-distribution (OOD) samples. This limits the applications of neural network models in real-world scenarios, where OOD samples exist. Many existing approaches identify the OOD instances via exploiting various cues, such as finding irregular patterns in the feature space, logits space, gradient space or the raw space of images. In contrast, this paper proposes a simple Test-time Linear Training (ETLT) method for OOD detection. Empirically, we find that the probabilities of input images being out-of-distribution are surprisingly linearly correlated to the features extracted by neural networks. To be specific, many state-of-the-art OOD algorithms, although designed to measure reliability in different ways, actually lead to OOD scores mostly linearly related to their image features. Thus, by simply learning a linear regression model trained from the paired image features and inferred OOD scores at test-time, we can make a more precise OOD prediction for the test instances. We further propose an online variant of the proposed method, which achieves promising performance and is more practical in real-world applications. Remarkably, we improve FPR95 from $51.37\%$ to $12.30\%$ on CIFAR-10 datasets with maximum softmax probability as the base OOD detector. Extensive experiments on several benchmark datasets show the efficacy of ETLT for OOD detection task.
Joint Application of the Target Trial Causal Framework and Machine Learning Modeling to Optimize Antibiotic Therapy: Use Case on Acute Bacterial Skin and Skin Structure Infections due to Methicillin-resistant Staphylococcus aureus
Jun, Inyoung, Marini, Simone, Boucher, Christina A., Morris, J. Glenn, Bian, Jiang, Prosperi, Mattia
Bacterial infections are responsible for high mortality worldwide. Antimicrobial resistance underlying the infection, and multifaceted patient's clinical status can hamper the correct choice of antibiotic treatment. Randomized clinical trials provide average treatment effect estimates but are not ideal for risk stratification and optimization of therapeutic choice, i.e., individualized treatment effects (ITE). Here, we leverage large-scale electronic health record data, collected from Southern US academic clinics, to emulate a clinical trial, i.e., 'target trial', and develop a machine learning model of mortality prediction and ITE estimation for patients diagnosed with acute bacterial skin and skin structure infection (ABSSSI) due to methicillin-resistant Staphylococcus aureus (MRSA). ABSSSI-MRSA is a challenging condition with reduced treatment options - vancomycin is the preferred choice, but it has non-negligible side effects. First, we use propensity score matching to emulate the trial and create a treatment randomized (vancomycin vs. other antibiotics) dataset. Next, we use this data to train various machine learning methods (including boosted/LASSO logistic regression, support vector machines, and random forest) and choose the best model in terms of area under the receiver characteristic (AUC) through bootstrap validation. Lastly, we use the models to calculate ITE and identify possible averted deaths by therapy change. The out-of-bag tests indicate that SVM and RF are the most accurate, with AUC of 81% and 78%, respectively, but BLR/LASSO is not far behind (76%). By calculating the counterfactuals using the BLR/LASSO, vancomycin increases the risk of death, but it shows a large variation (odds ratio 1.2, 95% range 0.4-3.8) and the contribution to outcome probability is modest. Instead, the RF exhibits stronger changes in ITE, suggesting more complex treatment heterogeneity.
Causal Effect Estimation using Variational Information Bottleneck
Lu, Zhenyu, Cheng, Yurong, Zhong, Mingjun, Stoian, George, Yuan, Ye, Wang, Guoren
Causal inference is to estimate the causal effect in a causal relationship when intervention is applied. Precisely, in a causal model with binary interventions, i.e., control and treatment, the causal effect is simply the difference between the factual and counterfactual. The difficulty is that the counterfactual may never been obtained which has to be estimated and so the causal effect could only be an estimate. The key challenge for estimating the counterfactual is to identify confounders which effect both outcomes and treatments. A typical approach is to formulate causal inference as a supervised learning problem and so counterfactual could be predicted. Including linear regression and deep learning models, recent machine learning methods have been adapted to causal inference. In this paper, we propose a method to estimate Causal Effect by using Variational Information Bottleneck (CEVIB). The promising point is that VIB is able to naturally distill confounding variables from the data, which enables estimating causal effect by using observational data. We have compared CEVIB to other methods by applying them to three data sets showing that our approach achieved the best performance. We also experimentally showed the robustness of our method.