Goto

Collaborating Authors

 Regression


Regression Concept Vectors for Bidirectional Explanations in Histopathology

arXiv.org Machine Learning

Explanations for deep neural network predictions in terms of domain-related concepts can be valuable in medical applications, where justifications are important for confidence in the decision-making. In this work, we propose a methodology to exploit continuous concept measures as Regression Concept Vectors (RCVs) in the activation space of a layer. The directional derivative of the decision function along the RCVs represents the network sensitivity to increasing values of a given concept measure. When applied to breast cancer grading, nuclei texture emerges as a relevant concept in the detection of tumor tissue in breast lymph node samples. We evaluate score robustness and consistency by statistical analysis.


A unifying approach for doubly-robust $\ell_1$ regularized estimation of causal contrasts

arXiv.org Machine Learning

We consider inference about a scalar parameter under a non-parametric model based on a one-step estimator computed as a plug in estimator plus the empirical mean of an estimator of the parameter's influence function. We focus on a class of parameters that have influence function which depends on two infinite dimensional nuisance functions and such that the bias of the one-step estimator of the parameter of interest is the expectation of the product of the estimation errors of the two nuisance functions. Our class includes many important treatment effect contrasts of interest in causal inference and econometrics, such as ATE, ATT, an integrated causal contrast with a continuous treatment, and the mean of an outcome missing not at random. We propose estimators of the target parameter that entertain approximately sparse regression models for the nuisance functions allowing for the number of potential confounders to be even larger than the sample size. By employing sample splitting, cross-fitting and $\ell_1$-regularized regression estimators of the nuisance functions based on objective functions whose directional derivatives agree with those of the parameter's influence function, we obtain estimators of the target parameter with two desirable robustness properties: (1) they are rate doubly-robust in that they are root-n consistent and asymptotically normal when both nuisance functions follow approximately sparse models, even if one function has a very non-sparse regression coefficient, so long as the other has a sufficiently sparse regression coefficient, and (2) they are model doubly-robust in that they are root-n consistent and asymptotically normal even if one of the nuisance functions does not follow an approximately sparse model so long as the other nuisance function follows an approximately sparse model with a sufficiently sparse regression coefficient.


Supervised Discrete Hashing with Relaxation

arXiv.org Machine Learning

Data-dependent hashing has recently attracted attention due to being able to support efficient retrieval and storage of high-dimensional data such as documents, images, and videos. In this paper, we propose a novel learning-based hashing method called "Supervised Discrete Hashing with Relaxation" (SDHR) based on "Supervised Discrete Hashing" (SDH). SDH uses ordinary least squares regression and traditional zero-one matrix encoding of class label information as the regression target (code words), thus fixing the regression target. In SDHR, the regression target is instead optimized. The optimized regression target matrix satisfies a large margin constraint for correct classification of each example. Compared with SDH, which uses the traditional zero-one matrix, SDHR utilizes the learned regression target matrix and, therefore, more accurately measures the classification error of the regression model and is more flexible. As expected, SDHR generally outperforms SDH. Experimental results on two large-scale image datasets (CIFAR-10 and MNIST) and a large-scale and challenging face dataset (FRGC) demonstrate the effectiveness and efficiency of SDHR.


A proof of convergence of multi-class logistic regression network

arXiv.org Machine Learning

This paper revisits the special type of a neural network known under two names. In the statistics and machine learning community it is known as a multi-class logistic regression neural network. In the neural network community, it is simply the soft-max layer. The importance is underscored by its role in deep learning: as the last layer, whose autput is actually the classification of the input patterns, such as images. Our exposition focuses on mathematically rigorous derivation of the key equation expressing the gradient. The fringe benefit of our approach is a fully vectorized expression, which is a basis of an efficient implementation. The second result of this paper is the positivity of the second derivative of the cross-entropy loss function as function of the weights. This result proves that optimization methods based on convexity may be used to train this network. As a corollary, we demonstrate that no $L^2$-regularizer is needed to guarantee convergence of gradient descent.


Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection

arXiv.org Machine Learning

In this paper, we describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection. Our system relies on a variety of engineered features originally used to detect propaganda. This is based on the assumption that biased messages are propagandistic in the sense that they promote a particular political cause or viewpoint. We trained a logistic regression model with features ranging from simple bag-of-words to vocabulary richness and text readability features. Our system achieved 72.9% accuracy on the test data that is annotated manually and 60.8% on the test data that is annotated with distant supervision. Additional experiments showed that significant performance improvements can be achieved with better feature pre-processing.


Data Shapley: Equitable Valuation of Data for Machine Learning

arXiv.org Artificial Intelligence

As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions. For example, in healthcare and consumer markets, it has been suggested that individuals should be compensated for the data that they generate, but it is not clear what is an equitable valuation for individual data. In this work, we develop a principled framework to address data valuation in the context of supervised machine learning. Given a learning algorithm trained on $n$ data points to produce a predictor, we propose data Shapley as a metric to quantify the value of each training datum to the predictor performance. Data Shapley uniquely satisfies several natural properties of equitable data valuation. We develop Monte Carlo and gradient-based methods to efficiently estimate data Shapley values in practical settings where complex learning algorithms, including neural networks, are trained on large datasets. In addition to being equitable, extensive experiments across biomedical, image and synthetic data demonstrate that data Shapley has several other benefits: 1) it is more powerful than the popular leave-one-out or leverage score in providing insight on what data is more valuable for a given learning task; 2) low Shapley value data effectively capture outliers and corruptions; 3) high Shapley value data inform what type of new data to acquire to improve the predictor.


Logitron: Perceptron-augmented classification model based on an extended logistic loss function

arXiv.org Machine Learning

Classification is the most important process in data analysis. However, due to the inherent non-convex and non-smooth structure of the zero-one loss function of the classification model, various convex surrogate loss functions such as hinge loss, squared hinge loss, logistic loss, and exponential loss are introduced. These loss functions have been used for decades in diverse classification models, such as SVM (support vector machine) with hinge loss, logistic regression with logistic loss, and Adaboost with exponential loss and so on. In this work, we present a Perceptron-augmented convex classification framework, {\it Logitron}. The loss function of it is a smoothly stitched function of the extended logistic loss with the famous Perceptron loss function. The extended logistic loss function is a parameterized function established based on the extended logarithmic function and the extended exponential function. The main advantage of the proposed Logitron classification model is that it shows the connection between SVM and logistic regression via polynomial parameterization of the loss function. In more details, depending on the choice of parameters, we have the Hinge-Logitron which has the generalized $k$-th order hinge-loss with an additional $k$-th root stabilization function and the Logistic-Logitron which has a logistic-like loss function with relatively large $|k|$. Interestingly, even $k=-1$, Hinge-Logitron satisfies the classification-calibration condition and shows reasonable classification performance with low computational cost. The numerical experiment in the linear classifier framework demonstrates that Hinge-Logitron with $k=4$ (the fourth-order SVM with the fourth root stabilization function) outperforms logistic regression, SVM, and other Logitron models in terms of classification accuracy.


Artificial Neural Network Modeling for Path Loss Prediction in Urban Environments

arXiv.org Machine Learning

Although various linear log-distance path loss models have been developed, advanced models are requiring to more accurately and flexibly represent the path loss for complex environments such as the urban area. This letter proposes an artificial neural network (ANN) based multi-dimensional regression framework for path loss modeling in urban environments at 3 to 6 GHz frequency band. ANN is used to learn the path loss structure from the measured path loss data which is a function of distance and frequency. The effect of the network architecture parameter (activation function, the number of hidden layers and nodes) on the prediction accuracy are analyzed. We observe that the proposed model is more accurate and flexible compared to the conventional linear model.


Byzantine Fault Tolerant Distributed Linear Regression

arXiv.org Machine Learning

This paper considers the problem of Byzantine fault tolerance in distributed linear regression in a multi-agent system. However, the proposed algorithms are given for a more general class of distributed optimization problems, of which distributed linear regression is a special case. The system comprises of a server and multiple agents, where each agent is holding a certain number of data points and responses that satisfy a linear relationship (could be noisy). The objective of the server is to determine this relationship, given that some of the agents in the system (up to a known number) are Byzantine faulty (aka. actively adversarial). We show that the server can achieve this objective, in a deterministic manner, by robustifying the original distributed gradient descent method using norm based filters, namely 'norm filtering' and 'norm-cap filtering', incurring an additional log-linear computation cost in each iteration. The proposed algorithms improve upon the existing methods on three levels: i) no assumptions are required on the probability distribution of data points, ii) system can be partially asynchronous, and iii) the computational overhead (in order to handle Byzantine faulty agents) is log-linear in number of agents and linear in dimension of data points. The proposed algorithms differ from each other in the assumptions made for their correctness, and the gradient filter they use.


Smoothed Online Optimization for Regression and Control

arXiv.org Machine Learning

We consider Online Convex Optimization (OCO) in the setting where the costs are $m$-strongly convex and the online learner pays a switching cost for changing decisions between rounds. We show that the recently proposed Online Balanced Descent (OBD) algorithm is constant competitive in this setting, with competitive ratio $3 + O(1/m)$, irrespective of the ambient dimension. Additionally, we show that when the sequence of cost functions is $\epsilon$-smooth, OBD has near-optimal dynamic regret and maintains strong per-round accuracy. We demonstrate the generality of our approach by showing that the OBD framework can be used to construct competitive algorithms for a variety of online problems across learning and control, including online variants of ridge regression, logistic regression, maximum likelihood estimation, and LQR control.