Goto

Collaborating Authors

 function estimation


Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

Neural Information Processing Systems

Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios). While recent works on marginalized importance sampling (MIS) show that the former can enjoy provable guarantees under realizable function approximation, the latter is only known to be feasible under much stronger assumptions such as prohibitively expressive discriminators. In this work, we provide guarantees for off-policy function estimation under only realizability, by imposing proper regularization on the MIS objectives. Compared to commonly used regularization in MIS, our regularizer is much more flexible and can account for an arbitrary user-specified distribution, under which the learned function will be close to the groundtruth. We provide exact characterization of the optimal dual solution that needs to be realized by the discriminator class, which determines the datacoverage assumption in the case of value-function learning. As another surprising observation, the regularizer can be altered to relax the data-coverage requirement, and completely eliminate it in the ideal case with strong side information.


A Theory of Nonparametric Covariance Function Estimation for Discretely Observed Data

arXiv.org Machine Learning

We study nonparametric covariance function estimation for functional data observed with noise at discrete locations on a $d$-dimensional domain. Estimating the covariance function from discretely observed data is a challenging nonparametric problem, particularly in multidimensional settings, since the covariance function is defined on a product domain and thus suffers from the curse of dimensionality. This motivates the use of adaptive estimators, such as deep learning estimators. However, existing theoretical results are largely limited to estimators with explicit analytic representations, and the properties of general learning-based estimators remain poorly understood. We establish an oracle inequality for a broad class of learning-based estimators that applies to both sparse and dense observation regimes in a unified manner, and derive convergence rates for deep learning estimators over several classes of covariance functions. The resulting rates suggest that structural adaptation can mitigate the curse of dimensionality, similarly to classical nonparametric regression. We further compare the convergence rates of learning-based estimators with several existing procedures. For a one-dimensional smoothness class, deep learning estimators are suboptimal, whereas local linear smoothing estimators achieve a faster rate. For a structured function class, however, deep learning estimators attain the minimax rate up to polylogarithmic factors, whereas local linear smoothing estimators are suboptimal. These results reveal a distinctive adaptivity-variance trade-off in covariance function estimation.


Data Generation without Function Estimation

arXiv.org Machine Learning

Estimating the score function (or other population-density-dependent functions) is a fundamental component of most generative models. However, such function estimation is computationally and statistically challenging. Can we avoid function estimation for data generation? We propose an estimation-free generative method: A set of points whose locations are deterministically updated with (inverse) gradient descent can transport a uniform distribution to arbitrary data distribution, in the mean field regime, without function estimation, training neural networks, and even noise injection. The proposed method is built upon recent advances in the physics of interacting particles. We show, both theoretically and experimentally, that these advances can be leveraged to develop novel generative methods.


Neural Inverse Source Problems

arXiv.org Artificial Intelligence

Reconstructing unknown external source functions is an important perception capability for a large range of robotics domains including manipulation, aerial, and underwater robotics. In this work, we propose a Physics-Informed Neural Network (PINN [1]) based approach for solving the inverse source problems in robotics, jointly identifying unknown source functions and the complete state of a system given partial and noisy observations. Our approach demonstrates several advantages over prior works (Finite Element Methods (FEM) and data-driven approaches): it offers flexibility in integrating diverse constraints and boundary conditions; eliminates the need for complex discretizations (e.g., meshing); easily accommodates gradients from real measurements; and does not limit performance based on the diversity and quality of training data. We validate our method across three simulation and real-world scenarios involving up to 4th order partial differential equations (PDEs), constraints such as Signorini and Dirichlet, and various regression losses including Chamfer distance and L2 norm.


Reviews: Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces

Neural Information Processing Systems

Strengths 1. Considering dynamic programming problems in continuous time such that the methodologies and tools of dynamical systems and stochastic di_x000b_eren- tial equations is interesting, and the authors do a good job of motivating the generalities of the problem context. The parameterizations considered of the value functions at the end of the day belong to discrete time, due to the need to discretize the SDEs and sample the state-action-reward triples. Given this discrete implementa- tion, and the fact that experimentally the authors run into the conven- tional di_x000e_culties of discrete time algorithms with continuous state-action function approximation, I am a little bewildered as to what the actual bene_x000c_t is of this problem formulation, especially since it requires a re- de_x000c_nition of the value function as one that is compatible with SDEs (eqn. That is, the intrinsic theoretical bene_x000c_ts of this perspective are not clear, especially since the main theorem is expressed in terms of RKHS only. However, these methods are fundamentally limited by their sample complexity bottleneck, i.e., the quadratic complexity in the sample size.


Advancing Causal Inference: A Nonparametric Approach to ATE and CATE Estimation with Continuous Treatments

arXiv.org Machine Learning

This paper introduces a generalized ps-BART model for the estimation of Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE) in continuous treatments, addressing limitations of the Bayesian Causal Forest (BCF) model. The ps-BART model's nonparametric nature allows for flexibility in capturing nonlinear relationships between treatment and outcome variables. Across three distinct sets of Data Generating Processes (DGPs), the ps-BART model consistently outperforms the BCF model, particularly in highly nonlinear settings. The ps-BART model's robustness in uncertainty estimation and accuracy in both point-wise and probabilistic estimation demonstrate its utility for real-world applications. This research fills a crucial gap in causal inference literature, providing a tool better suited for nonlinear treatment-outcome relationships and opening avenues for further exploration in the domain of continuous treatment effect estimation.


Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

arXiv.org Artificial Intelligence

Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios). While recent works on marginalized importance sampling (MIS) show that the former can enjoy provable guarantees under realizable function approximation, the latter is only known to be feasible under much stronger assumptions such as prohibitively expressive discriminators. In this work, we provide guarantees for off-policy function estimation under only realizability, by imposing proper regularization on the MIS objectives. Compared to commonly used regularization in MIS, our regularizer is much more flexible and can account for an arbitrary user-specified distribution, under which the learned function will be close to the groundtruth. We provide exact characterization of the optimal dual solution that needs to be realized by the discriminator class, which determines the data-coverage assumption in the case of value-function learning. As another surprising observation, the regularizer can be altered to relax the data-coverage requirement, and completely eliminate it in the ideal case with strong side information.


Spherical Poisson Point Process Intensity Function Modeling and Estimation with Measure Transport

arXiv.org Machine Learning

Recent years have seen an increased interest in the application of methods and techniques commonly associated with machine learning and artificial intelligence to spatial statistics. Here, in a celebration of the ten-year anniversary of the journal Spatial Statistics, we bring together normalizing flows, commonly used for density function estimation in machine learning, and spherical point processes, a topic of particular interest to the journal's readership, to present a new approach for modeling non-homogeneous Poisson process intensity functions on the sphere. The central idea of this framework is to build, and estimate, a flexible bijective map that transforms the underlying intensity function of interest on the sphere into a simpler, reference, intensity function, also on the sphere. Map estimation can be done efficiently using automatic differentiation and stochastic gradient descent, and uncertainty quantification can be done straightforwardly via nonparametric bootstrap. We investigate the viability of the proposed method in a simulation study, and illustrate its use in a proof-of-concept study where we model the intensity of cyclone events in the North Pacific Ocean. Our experiments reveal that normalizing flows present a flexible and straightforward way to model intensity functions on spheres, but that their potential to yield a good fit depends on the architecture of the bijective map, which can be difficult to establish in practice.


Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian Processes

arXiv.org Machine Learning

Value function approximation is a crucial module for policy evaluation in reinforcement learning when the state space is large or continuous. The present paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning, where a Gaussian process (GP) prior is presumed on the sought value function, and instantaneous rewards are probabilistically generated based on value function evaluations at two consecutive states. Capitalizing on a random feature-based approximant of the GP prior, an online scalable (OS) approach, termed {OS-GPTD}, is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs. To benchmark the performance of OS-GPTD even in an adversarial setting, where the modeling assumptions are violated, complementary worst-case analyses are performed by upper-bounding the cumulative Bellman error as well as the long-term reward prediction error, relative to their counterparts from a fixed value function estimator with the entire state-reward trajectory in hindsight. Moreover, to alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme, termed OS-EGPTD, that can jointly infer the value function, and select interactively the EGP kernel on-the-fly. Finally, performances of the novel OS-(E)GPTD schemes are evaluated on two benchmark problems.


Variance function estimation in regression model via aggregation procedures

arXiv.org Machine Learning

In the regression problem, we consider the problem of estimating the variance function by the means of aggregation methods. We focus on two particular aggregation setting: Model Selection aggregation (MS) and Convex aggregation (C) where the goal is to select the best candidate and to build the best convex combination of candidates respectively among a collection of candidates. In both cases, the construction of the estimator relies on a two-step procedure and requires two independent samples. The first step exploits the first sample to build the candidate estimators for the variance function by the residual-based method and then the second dataset is used to perform the aggregation step. We show the consistency of the proposed method with respect to the L 2error both for MS and C aggregations. We evaluate the performance of these two methods in the heteroscedastic model and illustrate their interest in the regression problem with reject option.