Goto

Collaborating Authors

 Mathematical & Statistical Methods


From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical Approximation and Statistical Applications

arXiv.org Machine Learning

Statistical distances, i.e., discrepancy measures between probability distributions, are ubiquitous in probability theory, statistics and machine learning. To combat the curse of dimensionality when estimating these distances from data, recent work has proposed smoothing out local irregularities in the measured distributions via convolution with a Gaussian kernel. Motivated by the scalability of the smooth framework to high dimensions, we conduct an in-depth study of the structural and statistical behavior of the Gaussian-smoothed $p$-Wasserstein distance $\mathsf{W}_p^{(\sigma)}$, for arbitrary $p\geq 1$. We start by showing that $\mathsf{W}_p^{(\sigma)}$ admits a metric structure that is topologically equivalent to classic $\mathsf{W}_p$ and is stable with respect to perturbations in $\sigma$. Moving to statistical questions, we explore the asymptotic properties of $\mathsf{W}_p^{(\sigma)}(\hat{\mu}_n,\mu)$, where $\hat{\mu}_n$ is the empirical distribution of $n$ i.i.d. samples from $\mu$. To that end, we prove that $\mathsf{W}_p^{(\sigma)}$ is controlled by a $p$th order smooth dual Sobolev norm $\mathsf{d}_p^{(\sigma)}$. Since $\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu)$ coincides with the supremum of an empirical process indexed by Gaussian-smoothed Sobolev functions, it lends itself well to analysis via empirical process theory. We derive the limit distribution of $\sqrt{n}\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu)$ in all dimensions $d$, when $\mu$ is sub-Gaussian. Through the aforementioned bound, this implies a parametric empirical convergence rate of $n^{-1/2}$ for $\mathsf{W}_p^{(\sigma)}$, contrasting the $n^{-1/d}$ rate for unsmoothed $\mathsf{W}_p$ when $d \geq 3$. As applications, we provide asymptotic guarantees for two-sample testing and minimum distance estimation. When $p=2$, we further show that $\mathsf{d}_2^{(\sigma)}$ can be expressed as a maximum mean discrepancy.


The Gaussian Neural Process

arXiv.org Machine Learning

Neural Processes (NPs; Garnelo et al., 2018a,b) are a rich class of models for meta-learning that map data sets directly to predictive stochastic processes. We provide a rigorous analysis of the standard maximum-likelihood objective used to train conditional NPs. Moreover, we propose a new member to the Neural Process family called the Gaussian Neural Process (GNP), which models predictive correlations, incorporates translation equivariance, provides universal approximation guarantees, and demonstrates encouraging performance.


A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

arXiv.org Artificial Intelligence

Query optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.


Probability Distributions in Data Science - KDnuggets

#artificialintelligence

Bio: Pier Paolo Ippolito is a SAS Data Scientist and MSc in Artificial Intelligence graduate from the University of Southampton. He has a strong interest in AI advancements and machine learning applications (such as finance and medicine).


Territory Design for Dynamic Multi-Period Vehicle Routing Problem with Time Windows

arXiv.org Artificial Intelligence

This study introduces the Territory Design for Dynamic Multi-Period Vehicle Routing Problem with Time Windows (TD-DMPVRPTW), motivated by a real-world application at a food company's distribution center. This problem deals with the design of contiguous and compact territories for delivery of orders from a depot to a set of customers, with time windows, over a multi-period planning horizon. Customers and their demands vary dynamically over time. The problem is modeled as a mixed-integer linear program (MILP) and solved by a proposed heuristic. The heuristic solutions are compared with the proposed MILP solutions on a set of small artificial instances and the food company's solutions on a set of real-world instances. Computational results show that the proposed algorithm can yield high-quality solutions within moderate running times.


All You Need To Know About Building A Career In Machine Learning!

#artificialintelligence

Mathematics: If you want to thrive in the field of data science then you need to have a certain familiarity with calculus, probability, linear algebra, and mathematics. Various standard models are essential to construct ML algorithms. In general, a data scientist should know something about probability and statistics theory as the rest depends on the job you apply for. Computer science: It is a study dealing with software systems and includes their theory, development, design, and application. It takes a scientific approach to do computation and carry out its applications. Computer science is considered as a foundation that makes achievements and obtaining more knowledge in the field easier.


Spectral Methods for Data Science: A Statistical Perspective

arXiv.org Machine Learning

Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. While the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\ell_2$ perturbation analysis, we present a systematic $\ell_{\infty}$ and $\ell_{2,\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful "leave-one-out" analysis framework.


Applications of multivariate quasi-random sampling with neural networks

arXiv.org Machine Learning

Generative moment matching networks (GMMNs) are suggested for modeling the cross-sectional dependence between stochastic processes. The stochastic processes considered are geometric Brownian motions and ARMA-GARCH models. Geometric Brownian motions lead to an application of pricing American basket call options under dependence and ARMA-GARCH models lead to an application of simulating predictive distributions. In both types of applications the benefit of using GMMNs in comparison to parametric dependence models is highlighted and the fact that GMMNs can produce dependent quasi-random samples with no additional effort is exploited to obtain variance reduction.


Linear Algebra for Machine Learning

#artificialintelligence

Linear algebra, via the use of matrices and vectors, along with linear algebra libraries (such as NumPy in Python), allows us to perform a large number of calculations in a more computationally efficient way while using simpler code. Knowing at least the numeric operations of linear algebra is crucial to further understanding what happens in our machine learning models. Although having the geometric intuition behind linear algebra can be incredibly useful in visualizing the operations we will discuss below, it is not required to understand most machine learning algorithms. In this tutorial, we will discuss scalars, vectors, matrices, matrix-matrix addition and subtraction, scalar multiplication and division, matrix-vector multiplication, matrix-matrix multiplication, identity matrices, matrix inverses, and matrix transposes. In addition, we will very briefly discuss some of the geometric intuition behind some of these numeric operations.


Recent Developments in Boolean Matrix Factorization

arXiv.org Artificial Intelligence

Boolean matrix factorization (BMF) is a variant of the standard matrix factorization problem in the Boolean semiring: given a binary matrix, the task is to find two smaller binary matrices so that their product, taken over the Boolean semiring, is as close to the original matrix as possible. Because the matrix product is not done over a field but over a semiring, many standard matrix factorization techniques fail to work. Indeed, finding the best Boolean factorization is computationally hard. The computational hardness of the problem has not prevented people from studying it. In psychometrics, some of the first algorithms appeared in the 1980's (see Bělohlávek and Trnecka (2018)). Even before that, mathematicians studying combinatorics had studied the "Boolean linear algebra" (Kim, 1982; Monson et al., 1995).