Regression
A Differentially Private Wilcoxon Signed-Rank Test
Couch, Simon, Kazan, Zeki, Shi, Kaiyan, Bray, Andrew, Groce, Adam
Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we present a differentially private analogue of the classic Wilcoxon signed-rank hypothesis test, which is used when comparing sets of paired (e.g., before-and-after) data values. We present not only a private estimate of the test statistic, but a method to accurately compute a p-value and assess statistical significance. We evaluate our test on both simulated and real data. Compared to the only existing private test for this situation, that of Task and Clifton, we find that our test requires less than half as much data to achieve the same statistical power.
An Analysis of Hierarchical Text Classification Using Word Embeddings
Stein, Roger A., Jaques, Patricia A., Valiati, Joao F.
Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and algorithms on this specific problem by means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations---fastText, XGBoost, SVM, and Keras' CNN---and noticeable word embeddings generation methods---GloVe, word2vec, and fastText---with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an ${}_{LCA}F_1$ of 0.893 on a single-labeled version of the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is a very promising approach for HTC.
Kapil Sharma
I previously wrote a post about Kernel Smoothing and how it can be used to fit a non-linear function non-parametrically. In this post, I will extend on that idea and try to mitigate the disadvantages of kernel smoothing using Local Linear Regression. I generated some data in my previous post and I will reuse the same data for this post. The data was generated from the function $\mathbf{y f(x) sin(4x) 2}$ with some Gaussian noise and here's how it looks: As I mentioned in the previous article, in kernel smoothing out-of-sample predictions on the edges and in sparse regions can have significant errors and bias. In Local Linear Regression, we try to reduce this bias to first order, by fitting straight lines instead of local constants.
Boosting Binary Optimization via Binary Classification: A Case Study of Job Shop Scheduling
Many optimization techniques evaluate solutions consecutively, where the next candidate for evaluation is determined by the results of previous evaluations. For example, these include iterative methods, "black box" optimization algorithms, simulated annealing, evolutionary algorithms and tabu search, to name a few. When solving an optimization problem, these algorithms evaluate a large number of solutions, which raises the following question: Is it possible to learn something about the optimum using these solutions? In this paper, we define this "learning" question in terms of a logistic regression model and explore its predictive accuracy computationally. The proposed model uses a collection of solutions to predict the components of the optimal solutions. To illustrate the utility of such predictions, we embed the logistic regression model into the tabu search algorithm for job shop scheduling problem. The resulting framework is simple to implement, yet provides a significant boost to the performance of the standard tabu search.
How to Make Your Machine Learning Models Robust to Outliers
"So unexpected was the hole that for several years computers analyzing ozone data had systematically thrown out the readings that should have pointed to its growth." According to Wikipedia, an outlier is an observation point that is distant from other observations. This definition is vague because it doesn't quantify the word "distant". In this blog, we'll try to understand the different interpretations of this "distant" notion. We will also look into the outlier detection and treatment techniques while seeing their impact on different types of machine learning models.
Accelerated proximal boosting
Fouillen, Erwan, Boyer, Claire, Sangnier, Maxime
Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable. In addition, the novel boosting approach, called accelerated proximal boosting, benefits from Nesterov's acceleration in the same way as gradient boosting [Biau et al., 2018]. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy.
Building a Robust Text Classifier on a Test-Time Budget
Parvez, Md Rizwan, Bolukbasi, Tolga, Chang, kai-Wei, Saligrama, Venkatesh
In this paper, we study a generic learning framework for building robust text classification model that achieves accuracy comparable to standard full models under test-time budget constraints. Our approach learns a selector to identify words that are relevant to the prediction tasks and only passes these words to the classifier for processing. The selector is trained jointly with the classifier and directly learns to incorporate with the classifier. We further propose a data aggregation scheme to improve the robustness of the classifier. Our learning framework is general and can be incorporated with any type of text classification model. On real-world data, we show that the proposed approach improves the performance of a given classifier and speeds up the model with a mere loss in accuracy performance.
Realizing quantum linear regression with auxiliary qumodes
Zhang, Dan-Bo, Xue, Zheng-Yuan, Zhu, Shi-Liang, Wang, Z. D.
In order to exploit quantum advantages, quantum algorithms are indispensable for operating machine learning with quantum computers. We here propose an intriguing hybrid approach of quantum information processing for quantum linear regression, which utilizes both discrete and continuous quantum variables, in contrast to existing wisdoms based solely upon discrete qubits. In our framework, data information is encoded in a qubit system, while information processing is tackled using auxiliary continuous qumodes via qubit-qumode interactions. Moreover, it is also elaborated that finite squeezing is quite helpful for efficiently running the quantum algorithms in realistic setup. Comparing with an all-qubit approach, the present hybrid approach is more efficient and feasible for implementing quantum algorithms, still retaining exponential quantum speed-up.
Detecting Outliers in Data with Correlated Measures
Kuo, Yu-Hsuan, Li, Zhenhui, Kifer, Daniel
Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.
DNN: A Two-Scale Distributional Tale of Heterogeneous Treatment Effect Inference
Fan, Yingying, Lv, Jinchi, Wang, Jingbo
Heterogeneous treatment effects are the center of gravity in many modern causal inference applications. In this paper, we investigate the estimation and inference of heterogeneous treatment effects with precision in a general nonparametric setting. To this end, we enhance the classical $k$-nearest neighbor method with a simple algorithm, extend it to a distributional setting, and suggest the two-scale distributional nearest neighbors (DNN) estimator with reduced finite-sample bias. Our recipe is first to subsample the data and average the 1-nearest neighbor estimators from each subsample. With appropriately chosen subsampling scale, the resulting DNN estimator is proved to be asymptotically unbiased and normal under mild regularity conditions. We then proceed with combining DNN estimators with different subsampling scales to further reduce bias. Our theoretical results on the advantages of the new two-scale DNN framework are well supported by several Monte Carlo simulations. The newly suggested method is also applied to a real-life data set to study the heterogeneity of treatment effects of smoking on children's birth weights across mothers' ages.