Goto

Collaborating Authors

 Eckstein, Stephan


Time-Causal VAE: Robust Financial Time Series Generator

arXiv.org Artificial Intelligence

For financial time series, the shortage of samples makes it statistically hard for empirical processes to achieve an acceptable confidence level in describing the underlying market distribution. In practice, it is widely recognized among financial engineers that back-testing exclusively on empirical market data results in significant over-fitting, which leads to unpredictably high risks in decision making based on these tests [Bai+16]. Synthetic data are therefore generated to augment scarce market data, and used to improve backtesting, stress-testing, exploring new scenarios, and in deep learning processes in financial applications; see the overview given in [Ass+20a]. For those purposes, the generated data should look like plausible samples from the underlying market distribution, for example reproducing stylized facts observed in the market. In particular, we want the distribution of the generated data to be close to the underlying market distribution in their performance on decision making problems, such as pricing and hedging, as well as optimal stopping and utility maximization. Notably, these problems are not continuous with respect to widely used distances, such as the Maximum Mean Discrepancy (MMD) and the Wasserstein distances (W-distances). On the other hand, these problems are Lipschitz-continuous with respect to stronger metrics, called adapted Wasserstein distances (AW-distances) [Bac+20; PP14].


Dimensionality Reduction and Wasserstein Stability for Kernel Regression

arXiv.org Machine Learning

In a high-dimensional regression framework, we study consequences of the naive two-step procedure where first the dimension of the input variables is reduced and second, the reduced input variables are used to predict the output variable with kernel regression. In order to analyze the resulting regression errors, a novel stability result for kernel regression with respect to the Wasserstein distance is derived. This allows us to bound errors that occur when perturbed input data is used to fit the regression function. We apply the general stability result to principal component analysis (PCA). Exploiting known estimates from the literature on both principal component analysis and kernel regression, we deduce convergence rates for the two-step procedure. The latter turns out to be particularly useful in a semi-supervised setting.


Hilbert's projective metric for functions of bounded growth and exponential convergence of Sinkhorn's algorithm

arXiv.org Machine Learning

We study versions of Hilbert's projective metric for spaces of integrable functions of bounded growth. These metrics originate from cones which are relaxations of the cone of all non-negative functions, in the sense that they include all functions having non-negative integral values when multiplied with certain test functions. We show that kernel integral operators are contractions with respect to suitable specifications of such metrics even for kernels which are not bounded away from zero, provided that the decay to zero of the kernel is controlled. As an application to entropic optimal transport, we show exponential convergence of Sinkhorn's algorithm in settings where the marginal distributions have sufficiently light tails compared to the growth of the cost function.


Estimating the Rate-Distortion Function by Wasserstein Gradient Descent

arXiv.org Machine Learning

In the theory of lossy compression, the rate-distortion (R-D) function $R(D)$ describes how much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion). Obtaining $R(D)$ for a given data source establishes the fundamental performance limit for all compression algorithms. We propose a new method to estimate $R(D)$ from the perspective of optimal transport. Unlike the classic Blahut--Arimoto algorithm which fixes the support of the reproduction distribution in advance, our Wasserstein gradient descent algorithm learns the support of the optimal reproduction distribution by moving particles. We prove its local convergence and analyze the sample complexity of our R-D estimator based on a connection to entropic optimal transport. Experimentally, we obtain comparable or tighter bounds than state-of-the-art neural network methods on low-rate sources while requiring considerably less tuning and computation effort. We also highlight a connection to maximum-likelihood deconvolution and introduce a new class of sources that can be used as test cases with known solutions to the R-D problem.


Convergence Rates for Regularized Optimal Transport via Quantization

arXiv.org Machine Learning

We study the convergence of divergence-regularized optimal transport as the regularization parameter vanishes. Sharp rates for general divergences including relative entropy or $L^{p}$ regularization, general transport costs and multi-marginal problems are obtained. A novel methodology using quantization and martingale couplings is suitable for non-compact marginals and achieves, in particular, the sharp leading-order term of entropically regularized 2-Wasserstein distance for all marginals with finite $(2+\delta)$-moment.


MinMax Methods for Optimal Transport and Beyond: Regularization, Approximation and Numerics

arXiv.org Machine Learning

We study MinMax solution methods for a general class of optimization problems related to (and including) optimal transport. Theoretically, the focus is on fitting a large class of problems into a single MinMax framework and generalizing regularization techniques known from classical optimal transport. We show that regularization techniques justify the utilization of neural networks to solve such problems by proving approximation theorems and illustrating fundamental issues if no regularization is used. We further study the relation to the literature on generative adversarial nets, and analyze which algorithmic techniques used therein are particularly suitable to the class of problems studied in this paper. Several numerical experiments showcase the generality of the setting and highlight which theoretical insights are most beneficial in practice.


Lipschitz neural networks are dense in the set of all Lipschitz functions

arXiv.org Machine Learning

This note shows, under mild assumptions on the activation function, that the addition of a Lipschitz constraint does not inhibit the expressiveness of neural networks. The main result is the following: Theorem 1. Let ϕ be one time continuously differentiable and not polynomial, or let ϕ be the ReLU.


Computation of optimal transport and related hedging problems via penalization and neural networks

arXiv.org Machine Learning

This paper presents a widely applicable approach to solving (multi-marginal, martingale) optimal transport and related problems via neural networks. The core idea is to penalize the optimization problem in its dual formulation and reduce it to a finite dimensional one which corresponds to optimizing a neural network with smooth objective function. We present numerical examples from optimal transport, martingale optimal transport, portfolio optimization under uncertainty and generative adversarial networks that showcase the generality and effectiveness of the approach.