Regression
Improving Causal Effect Estimation of Weighted RegressionBased Estimator using Neural Networks
Shaha, Plabon, Zadid, Talha Islam, Rahman, Ismat, Khan, Md. Mosaddek
The do-calculus is a set of inference directives that helps the transformation of these interventions into more interpretable Estimating causal effects from observational data informs us about probabilistic sentences, and as such, enables an user to derive or which factors are important in an autonomous system, and enables confirm causal claims about interventions [14]. Results inferred us to take better decisions. This is important because it has applications from do-calculus is well understood on the whole but its application in selecting a treatment in medical systems or making is still questionable [10]. This is because do-calculus assumes that better strategies in industries or making better policies for our the distributions being used are error-free, but in practice, we do not government or even the society. Unavailability of complete data, have sufficient samples to confirm that. In case of limited samples, coupled with high cardinality of data, makes this estimation task a popular criterion, namely back-door criterion, is employed to computationally intractable.
A Guide to Generalization and Regularization in Machine Learning
Generalization and Regularization are two often terms that have the most significant role when you aim to build a robust machine learning model. The one-term refers to the model behaviour and another term is responsible for enhancing the model performance. In a straightforward way, it can be said that regularization helps the machine learning models for better generalization. In this post, we will cover each aspect of these terms and try to understand how these are linked to each other. The major points to be discussed in this article are outlined below.
Iterative Teaching by Label Synthesis
Liu, Weiyang, Liu, Zhen, Wang, Hanchen, Paull, Liam, Schölkopf, Bernhard, Weller, Adrian
In this paper, we consider the problem of iterative machine teaching, where a teacher provides examples sequentially based on the current iterative learner. In contrast to previous methods that have to scan over the entire pool and select teaching examples from it in each iteration, we propose a label synthesis teaching framework where the teacher randomly selects input teaching examples (e.g., images) and then synthesizes suitable outputs (e.g., labels) for them. We show that this framework can avoid costly example selection while still provably achieving exponential teachability. We propose multiple novel teaching algorithms in this framework. Finally, we empirically demonstrate the value of our framework.
Fair Sequential Selection Using Supervised Learning Models
Khalili, Mohammad Mahdi, Zhang, Xueru, Abroshan, Mahed
We consider a selection problem where sequentially arrived applicants apply for a limited number of positions/jobs. At each time step, a decision maker accepts or rejects the given applicant using a pre-trained supervised learning model until all the vacant positions are filled. In this paper, we discuss whether the fairness notions (e.g., equal opportunity, statistical parity, etc.) that are commonly used in classification problems are suitable for the sequential selection problems. In particular, we show that even with a pre-trained model that satisfies the common fairness notions, the selection outcomes may still be biased against certain demographic groups. This observation implies that the fairness notions used in classification problems are not suitable for a selection problem where the applicants compete for a limited number of positions. We introduce a new fairness notion, ``Equal Selection (ES),'' suitable for sequential selection problems and propose a post-processing approach to satisfy the ES fairness notion. We also consider a setting where the applicants have privacy concerns, and the decision maker only has access to the noisy version of sensitive attributes. In this setting, we can show that the perfect ES fairness can still be attained under certain conditions.
Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning
Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning. It can effectively identify helpful or harmful data points for a learning algorithm. In this paper, we propose Beta Shapley, which is a substantial generalization of Data Shapley. Beta Shapley arises naturally by relaxing the efficiency axiom of the Shapley value, which is not critical for machine learning settings. Beta Shapley unifies several popular data valuation methods and includes data Shapley as a special case. Moreover, we prove that Beta Shapley has several desirable statistical properties and propose efficient algorithms to estimate it. We demonstrate that Beta Shapley outperforms state-of-the-art data valuation methods on several downstream ML tasks such as: 1) detecting mislabeled training data; 2) learning with subsamples; and 3) identifying points whose addition or removal have the largest positive or negative impact on the model.
Prediction-focused Mixture Models
Narayanan, Sanjana, Sharma, Abhishek, Zeng, Catherine, Doshi-Velez, Finale
In several applications, besides getting a generative model of the data, we also want the model to be useful for specific downstream tasks. Mixture models are useful for identifying discrete components in the data, but may not identify components useful for downstream tasks if misspecified; further, current inference techniques often fail to overcome misspecification even when a supervisory signal is provided. We introduce the prediction-focused mixture model, which selects and models input features relevant to predicting the targets. We demonstrate that our approach identifies relevant signal from inputs even when the model is highly misspecified.
Communication-Efficient Distributed Quantile Regression with Optimal Statistical Guarantees
Battey, Heather, Tan, Kean Ming, Zhou, Wen-Xin
We address the problem of how to achieve optimal inference in distributed quantile regression without stringent scaling conditions. This is challenging due to the non-smooth nature of the quantile regression loss function, which invalidates the use of existing methodology. The difficulties are resolved through a double-smoothing approach that is applied to the local (at each data source) and global objective functions. Despite the reliance on a delicate combination of local and global smoothing parameters, the quantile regression model is fully parametric, thereby facilitating interpretation. In the low-dimensional regime, we discuss and compare several alternative confidence set constructions, based on inversion of Wald and score-type tests and resam-pling techniques, detailing an improvement that is effective for more extreme quantile coefficients. In high dimensions, a sparse framework is adopted, where the proposed doubly-smoothed objective function is complemented with an $\ell_1$-penalty. A thorough simulation study further elucidates our findings. Finally, we provide estimation theory and numerical studies for sparse quantile regression in the high-dimensional setting.
Maximum Correntropy Criterion Regression models with tending-to-zero scale parameters
It is known that the classical least square regression models achieve the optimal efficiency when the noises are Gaussian, however, they always underperform if the data is contaminated by non-Gaussian noises or outliers. Some robust regression models have been well developed in the past decades such as the median regression, the modal regression, the Huber regression and the least trimmed squares regression, etc. Moreover, a new robust regression model named the maximum correntropy criterion regression (MCCR) has been theoretically studied within the frame of statistical learning in Feng et al. (2015). Correntropy is constructed based on a kernel function and it is a generalized similarity measure between two random variables (see Santamar ıa et al. (2006); Gunduz and Principe (2009); Liu et al. (2007); He et al. (2011); Chen and Pr ıncipe (2012) 1
Overfitting vs. Underfitting In Linear Regression
In the previous courses, we have introduced linear and logistic regression, to model a Y variable which is discrete or continuous from one or more Xi variables, in all the examples used to illustrate this technique the modeling was relatively simple, the variable Y was generally modeled by a line parameterized by the variables Xi, but this modeling cannot be applied every time, an aquatic model must be chosen w.r.t to our data, in order to have the best fit. In this course we will study the effect of the choice of this modeling, we will see two cases, the first when the modeling is too weak to model our data, and the second is when the modeling is over-parameterized and that it will over-fit our data. Let's take a simple example and see what different modeling choices will produce in the fit of the data, we will use the following python code to generate and visualize the data, The figure above shows different fits for different choices of modeling assumptions, the first figure shows the simplest choice, modeling by a straight line of our data, in this case, we can notice that the modeling is very weak and we do not end with a good fit to our data, in this case, we are talking about underfitting, that is, the starting hypothesis is too weak for our data set. In this case, we notice that the modeling is over-parameterized, which gives an over-adjustment of our data without having a correct trajectory, we can notice that at the edge, we have a significant oscillation, which can mislead us if we want to predict the value of a new point which is at the edge, in this case, we speak of overfitting, that is to say, that our starting hypothesis is over-parameterized for our data. To sum up, when modeling data we can face two problems, first we can have a hypothesis that fails to model our data, and second, we can have a hypothesis that is over-parameterized and which will over-fit our data without the power to generalize to new examples, a trade-off must be made between the desired level of fit and the ability to generalize to new cases in order to have the best fit to the data.
Post-Regularization Confidence Bands for Ordinary Differential Equations
Ordinary differential equation (ODE) is an important tool to study the dynamics of a system of biological and physical processes. A central question in ODE modeling is to infer the significance of individual regulatory effect of one signal variable on another. However, building confidence band for ODE with unknown regulatory relations is challenging, and it remains largely an open question. In this article, we construct post-regularization confidence band for individual regulatory function in ODE with unknown functionals and noisy data observations. Our proposal is the first of its kind, and is built on two novel ingredients. The first is a new localized kernel learning approach that combines reproducing kernel learning with local Taylor approximation, and the second is a new de-biasing method that tackles infinite-dimensional functionals and additional measurement errors. We show that the constructed confidence band has the desired asymptotic coverage probability, and the recovered regulatory network approaches the truth with probability tending to one. We establish the theoretical properties when the number of variables in the system can be either smaller or larger than the number of sampling time points, and we study the regime-switching phenomenon. We demonstrate the efficacy of the proposed method through both simulations and illustrations with two data applications.