Collaborating Authors

nonparametric estimation

Nonparametric estimation of continuous DPPs with kernel methods Machine Learning

Determinantal Point Process (DPPs) are statistical models for repulsive point patterns. Both sampling and inference are tractable for DPPs, a rare feature among models with negative dependence that explains their popularity in machine learning and spatial statistics. Parametric and nonparametric inference methods have been proposed in the finite case, i.e. when the point patterns live in a finite ground set. In the continuous case, only parametric methods have been investigated, while nonparametric maximum likelihood for DPPs -- an optimization problem over trace-class operators -- has remained an open question. In this paper, we show that a restricted version of this maximum likelihood (MLE) problem falls within the scope of a recent representer theorem for nonnegative functions in an RKHS. This leads to a finite-dimensional problem, with strong statistical ties to the original MLE. Moreover, we propose, analyze, and demonstrate a fixed point algorithm to solve this finite-dimensional problem. Finally, we also provide a controlled estimate of the correlation kernel of the DPP, thus providing more interpretability.

Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms Machine Learning

The need to evaluate treatment effectiveness is ubiquitous in most of empirical science, and interest in flexibly investigating effect heterogeneity is growing rapidly. To do so, a multitude of model-agnostic, nonparametric meta-learners have been proposed in recent years. Such learners decompose the treatment effect estimation problem into separate sub-problems, each solvable using standard supervised learning methods. Choosing between different meta-learners in a data-driven manner is difficult, as it requires access to counterfactual information. Therefore, with the ultimate goal of building better understanding of the conditions under which some learners can be expected to perform better than others a priori, we theoretically analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression. We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice by considering a variety of neural network architectures as base-learners for the discussed meta-learning strategies. In a simulation study, we showcase the relative strengths of the learners under different data-generating processes.

Nonparametric Estimation in the Dynamic Bradley-Terry Model Machine Learning

We propose a time-varying generalization of the Bradley-Terry model that allows for nonparametric modeling of dynamic global rankings of distinct teams. We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time and is applicable in sparse settings where the Bradley-Terry may not be fit. We obtain necessary and sufficient conditions for the existence and uniqueness of our estimator. We also derive time-varying oracle bounds for both the estimation error and the excess risk in the model-agnostic setting where the Bradley-Terry model is not necessarily the true data generating process. We thoroughly test the practical effectiveness of our model using both simulated and real world data and suggest an efficient data-driven approach for bandwidth tuning.

Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards Machine Learning

Multi-armed bandits were first introduced in the landmark paper by Robbins (1952). The development of multi-armed bandit methodology has been partly motivated by clinical trials with the aim of balancing two competing goals, 1) to effectively identify the best treatment (exploration) and 2) to treat patients as effectively as possible during the trial (exploitation). The classic formulation of the multi-armed bandit problem in the context of clinical practice is as follows: there are l treatments (arms) to treat a disease. The doctor (decision maker) has to choose for each patient, one of the l available treatments, which result in a reward (response) of improvement in the condition of the patient. The goal is to maximize the cumulated rewardsas much as possible.

Robust Nonparametric Regression under Huber's $\epsilon$-contamination Model Machine Learning

We consider the non-parametric regression problem under Huber's $\epsilon$-contamination model, in which an $\epsilon$ fraction of observations are subject to arbitrary adversarial noise. We first show that a simple local binning median step can effectively remove the adversary noise and this median estimator is minimax optimal up to absolute constants over the H\"{o}lder function class with smoothness parameters smaller than or equal to 1. Furthermore, when the underlying function has higher smoothness, we show that using local binning median as pre-preprocessing step to remove the adversarial noise, then we can apply any non-parametric estimator on top of the medians. In particular we show local median binning followed by kernel smoothing and local polynomial regression achieve minimaxity over H\"{o}lder and Sobolev classes with arbitrary smoothness parameters. Our main proof technique is a decoupled analysis of adversary noise and stochastic noise, which can be potentially applied to other robust estimation problems. We also provide numerical results to verify the effectiveness of our proposed methods.

Nonparametric Estimation of Low Rank Matrix Valued Function Machine Learning

Let $A:[0,1]\rightarrow\mathbb{H}_m$ (the space of Hermitian matrices) be a matrix valued function which is low rank with entries in H\"{o}lder class $\Sigma(\beta,L)$. The goal of this paper is to study statistical estimation of $A$ based on the regression model $\mathbb{E}(Y_j|\tau_j,X_j) = \langle A(\tau_j), X_j \rangle,$ where $\tau_j$ are i.i.d. uniformly distributed in $[0,1]$, $X_j$ are i.i.d. matrix completion sampling matrices, $Y_j$ are independent bounded responses. We propose an innovative nuclear norm penalized local polynomial estimator and establish an upper bound on its point-wise risk measured by Frobenius norm. Then we extend this estimator globally and prove an upper bound on its integrated risk measured by $L_2$-norm. We also propose another new estimator based on bias-reducing kernels to study the case when $A$ is not necessarily low rank and establish an upper bound on its risk measured by $L_{\infty}$-norm. We show that the obtained rates are all optimal up to some logarithmic factor in minimax sense. Finally, we propose an adaptive estimation procedure based on Lepski's method and the penalized data splitting technique which is computationally efficient and can be easily implemented and parallelized.

Distributed Nonparametric Regression under Communication Constraints Machine Learning

This paper studies the problem of nonparametric estimation of a smooth function with data distributed across multiple machines. We assume an independent sample from a white noise model is collected at each machine, and an estimator of the underlying true function needs to be constructed at a central machine. We place limits on the number of bits that each machine can use to transmit information to the central machine. Our results give both asymptotic lower bounds and matching upper bounds on the statistical risk under various settings. We identify three regimes, depending on the relationship among the number of machines, the size of the data available at each machine, and the communication budget. When the communication budget is small, the statistical risk depends solely on this communication bottleneck, regardless of the sample size. In the regime where the communication budget is large, the classic minimax risk in the non-distributed estimation setting is recovered. In an intermediate regime, the statistical risk depends on both the sample size and the communication budget.

Nonparametric Hawkes Processes: Online Estimation and Generalization Bounds Machine Learning

In this paper, we design a nonparametric online algorithm for estimating the triggering functions of multivariate Hawkes processes. Unlike parametric estimation, where evolutionary dynamics can be exploited for fast computation of the gradient, and unlike typical function learning, where representer theorem is readily applicable upon proper regularization of the objective function, nonparametric estimation faces the challenges of (i) inefficient evaluation of the gradient, (ii) lack of representer theorem, and (iii) computationally expensive projection necessary to guarantee positivity of the triggering functions. In this paper, we offer solutions to the above challenges, and design an online estimation algorithm named NPOLE-MHP that outputs estimations with a $\mathcal{O}(1/T)$ regret, and a $\mathcal{O}(1/T)$ stability. Furthermore, we design an algorithm, NPOLE-MMHP, for estimation of multivariate marked Hawkes processes. We test the performance of NPOLE-MHP on various synthetic and real datasets, and demonstrate, under different evaluation metrics, that NPOLE-MHP performs as good as the optimal maximum likelihood estimation (MLE), while having a run time as little as parametric online algorithms.

Quantized Nonparametric Estimation over Sobolev Ellipsoids Machine Learning

We formulate the notion of minimax estimation under storage or communication constraints, and prove an extension to Pinsker's theorem for nonparametric estimation over Sobolev ellipsoids. Placing limits on the number of bits used to encode any estimator, we give tight lower and upper bounds on the excess risk due to quantization in terms of the number of bits, the signal size, and the noise level. This establishes the Pareto optimal tradeoff between storage and risk under quantization constraints for Sobolev spaces. Our results and proof techniques combine elements of rate distortion theory and minimax analysis. The proposed quantized estimation scheme, which shows achievability of the lower bounds, is adaptive in the usual statistical sense, achieving the optimal quantized minimax rate without knowledge of the smoothness parameter of the Sobolev space. It is also adaptive in a computational sense, as it constructs the code only after observing the data, to dynamically allocate more codewords to blocks where the estimated signal size is large. Simulations are included that illustrate the effect of quantization on statistical risk.

Quantized Estimation of Gaussian Sequence Models in Euclidean Balls

Neural Information Processing Systems

A central result in statistical theory is Pinsker's theorem, which characterizes the minimax rate in the normal means model of nonparametric estimation. In this paper, we present an extension to Pinsker's theorem where estimation is carried out under storage or communication constraints. In particular, we place limits on the number of bits used to encode an estimator, and analyze the excess risk in terms of this constraint, the signal size, and the noise level. We give sharp upper and lower bounds for the case of a Euclidean ball, which establishes the Pareto-optimal minimax tradeoff between storage and risk in this setting.