Goto

Collaborating Authors

 parametric family


Energy-Based Sliced Wasserstein Distance

Neural Information Processing Systems

The sliced Wasserstein (SW) distance has been widely recognized as a statistically effective and computationally efficient metric between two probability measures. A key component of the SW distance is the slicing distribution. There are two existing approaches for choosing this distribution. The first approach is using a fixed prior distribution. The second approach is optimizing for the best distribution which belongs to a parametric family of distributions and can maximize the expected distance.


Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning

Neural Information Processing Systems

We present a deep learning algorithm for the numerical solution of parametric families of high-dimensional linear Kolmogorov partial differential equations (PDEs). Our method is based on reformulating the numerical approximation of a whole family of Kolmogorov PDEs as a single statistical learning problem using the Feynman-Kac formula. Successful numerical experiments are presented, which empirically confirm the functionality and efficiency of our proposed algorithm in the case of heat equations and Black-Scholes option pricing models parametrized by affine-linear coefficient functions. We show that a single deep neural network trained on simulated data is capable of learning the solution functions of an entire family of PDEs on a full space-time region. Most notably, our numerical observations and theoretical results also demonstrate that the proposed method does not suffer from the curse of dimensionality, distinguishing it from almost all standard numerical methods for PDEs.



The Bias-Variance Tradeoff in Data-Driven Optimization: A Local Misspecification Perspective

Lan, Haixiang, Liao, Luofeng, Elmachtoub, Adam N., Kroer, Christian, Lam, Henry, Zhang, Haofeng

arXiv.org Machine Learning

Data-driven stochastic optimization is ubiquitous in machine learning and operational decision-making problems. Sample average approximation (SAA) and model-based approaches such as estimate-then-optimize (ETO) or integrated estimation-optimization (IEO) are all popular, with model-based approaches being able to circumvent some of the issues with SAA in complex context-dependent problems. Yet the relative performance of these methods is poorly understood, with most results confined to the dichotomous cases of the model-based approach being either well-specified or misspecified. We develop the first results that allow for a more granular analysis of the relative performance of these methods under a local misspecification setting, which models the scenario where the model-based approach is nearly well-specified. By leveraging tools from contiguity theory in statistics, we show that there is a bias-variance tradeoff between SAA, IEO, and ETO under local misspecification, and that the relative importance of the bias and the variance depends on the degree of local misspecification. Moreover, we derive explicit expressions for the decision bias, which allows us to characterize (un)impactful misspecification directions, and provide further geometric understanding of the variance.


Reviews: Fully Parameterized Quantile Function for Distributional Reinforcement Learning

Neural Information Processing Systems

POST-REBUTTAL I thank the authors for their detailed response. My main concern was the level of experimental detail provided in the submission, and I'm pleased that the authors have committed to including more of the details implicitly contained within the code in the paper itself. My overall recommendation remains the same; I think the paper should be published, and the strong Atari results will be of interest fairly widely. However, there were a few parts of the response I wasn't convinced by: (1) "(D) Inefficient Hyperparameter": I don't agree with the authors' claim that e.g. QR-DQN requires more hyperparameters than FQF (it seems to me that both algorithmically require the number of quantiles, and the standard hyperparameters associated with network architecture and training beyond that).


Review for NeurIPS paper: Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning

Neural Information Processing Systems

Weaknesses: The authors restrict to affine-linear parameters, but this restriction is not a requirement of the methodology and it is unclear from the text why the restriction was put in place. Is the purpose solely to narrow the scope of the analysis, or is there some performance loss for more general coefficients? It would be useful for the authors to comment in the paper on why this restriction is imposed. Similarly the theoretical results proven apply only to the example from Section 3.3 with a specific initial condition. A more general result would be preferable, particularly since the highest dimension application the authors have provided is actually for a different initial condition (though, of course, the input dimension for the parabaloid initial condition is still rather high). However a more general result may not be straightforward to derive.


Review for NeurIPS paper: Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning

Neural Information Processing Systems

The three reviewers, who hail from different sub-communities that all overlap with the paper's content, agree that this is a very well presented work that combines rarely used techniques (such as Feynman-Kac) to interesting ML use cases. It should thus be accepted. The reviewers also raised some concerns about the presentation of the experiments. Please make sure to address these for the camera-ready version.


A General Bayesian Framework for Informative Input Design in System Identification

Tzikas, Alexandros E., Kochenderfer, Mykel J.

arXiv.org Artificial Intelligence

We tackle the problem of informative input design for system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a methodology that is compatible with any system and parametric family of models. Our approach only requires input-output data from the system and first-order information from the model with respect to the parameters. Our algorithm consists of two modules. First, we formulate the problem of system identification from a Bayesian perspective and propose an approximate iterative method to optimize the model's parameters. Based on this Bayesian formulation, we are able to define a Gaussian-based uncertainty measure for the model parameters, which we can then minimize with respect to the next selected input. Our method outperforms model-free baselines with various linear and nonlinear dynamics.


Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

Lee, Seunghyun, Gu, Yuqi

arXiv.org Machine Learning

In the era of generative AI, deep generative models (DGMs) with latent representations have gained tremendous popularity. Despite their impressive empirical performance, the statistical properties of these models remain underexplored. DGMs are often overparametrized, non-identifiable, and uninterpretable black boxes, raising serious concerns when deploying them in high-stakes applications. Motivated by this, we propose an interpretable deep generative modeling framework for rich data types with discrete latent layers, called Deep Discrete Encoders (DDEs). A DDE is a directed graphical model with multiple binary latent layers. Theoretically, we propose transparent identifiability conditions for DDEs, which imply progressively smaller sizes of the latent layers as they go deeper. Identifiability ensures consistent parameter estimation and inspires an interpretable design of the deep architecture. Computationally, we propose a scalable estimation pipeline of a layerwise nonlinear spectral initialization followed by a penalized stochastic approximation EM algorithm. This procedure can efficiently estimate models with exponentially many latent components. Extensive simulation studies validate our theoretical results and demonstrate the proposed algorithms' excellent performance. We apply DDEs to three diverse real datasets for hierarchical topic modeling, image representation learning, response time modeling in educational testing, and obtain interpretable findings.


Towards An Unsupervised Learning Scheme for Efficiently Solving Parameterized Mixed-Integer Programs

Qu, Shiyuan, Dong, Fenglian, Wei, Zhiwei, Shang, Chao

arXiv.org Artificial Intelligence

In this paper, we describe a novel unsupervised learning scheme for accelerating the solution of a family of mixed integer programming (MIP) problems. Distinct substantially from existing learning-to-optimize methods, our proposal seeks to train an autoencoder (AE) for binary variables in an unsupervised learning fashion, using data of optimal solutions to historical instances for a parametric family of MIPs. By a deliberate design of AE architecture and exploitation of its statistical implication, we present a simple and straightforward strategy to construct a class of cutting plane constraints from the decoder parameters of an offline-trained AE. These constraints reliably enclose the optimal binary solutions of new problem instances thanks to the representation strength of the AE. More importantly, their integration into the primal MIP problem leads to a tightened MIP with the reduced feasible region, which can be resolved at decision time using off-the-shelf solvers with much higher efficiency. Our method is applied to a benchmark batch process scheduling problem formulated as a mixed integer linear programming (MILP) problem. Comprehensive results demonstrate that our approach significantly reduces the computational cost of off-the-shelf MILP solvers while retaining a high solution quality. The codes of this work are open-sourced at https://github.com/qushiyuan/AE4BV.