Goto

Collaborating Authors

 Statistical Learning




Machine Learning for Variance Reduction in Online Experiments

Neural Information Processing Systems

We consider the problem of variance reduction in randomized controlled trials, through the use of covariates correlated with the outcome but independent of the treatment. We propose a machine learning regression-adjusted treatment effect estimator, which we call MLRATE. MLRATE uses machine learning predictors of the outcome to reduce estimator variance. It employs cross-fitting to avoid overfitting biases, and we prove consistency and asymptotic normality under general conditions. MLRATE is robust to poor predictions from the machine learning step: if the predictions are uncorrelated with the outcomes, the estimator performs asymptotically no worse than the standard difference-in-means estimator, while if predictions are highly correlated with outcomes, the efficiency gains are large. In A/A tests, for a set of 48 outcome metrics commonly monitored in Facebook experiments the estimator has over 70% lower variance than the simple differencein-means estimator, and about 19% lower variance than the common univariate procedure which adjusts only for pre-experiment values of the outcome.





simple-saddle-camera-version

Neural Information Processing Systems

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function f: Rn!R, it outputs an -approximate second-order stationary point in O(logn/ 1.75)iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with O(log4 n/ 2) or O(log6 n/ 1.75) iterations, our algorithm is polynomially better in terms of logn and matches their complexities in terms of 1/ .




Active clustering for labeling training data

Neural Information Processing Systems

We also algorithm family, propose as a conjecture that they reach the minimum average items and analyze their complexity. In the second model, we analyze a specific the algorithms that minimize the average number of queries required to cluster the independently following a fixed distribution. In the first model, we characterize they form is drawn uniformly, the other one where each item chooses its class items, we consider two random models for the classes: one where the set partition classes (which can be labeled cheaply at the very end of the process). Given the cheap task of answering pairwise queries, and the computer groups the items into for training data gathering where the human experts perform the comparatively to see whether they belong to the same class. Thus motivated, we propose a setting determining the correct labels is much more expensive than comparing two items most practical cases rely on humans-in-the-loop to label the data. The process of has a high impact on the performance of the learned function.