Tarokh, Vahid
Deep Extreme Value Copulas for Estimation and Sampling
Hasan, Ali, Elkhalil, Khalil, Pereira, Joao M., Farsiu, Sina, Blanchet, Jose H., Tarokh, Vahid
Modeling the occurrence of extreme events is an important task in many disciplines, such as medicine, environmental science, engineering, and finance. For example, understanding the probability of a patient having an adverse reaction to medication or the distribution of economic shocks is critical to mitigating the associated effects of these events. However, these events are rare in occurrence and often difficult to characterize with traditional statistical tools. This has been the primary focus of extreme value theory (EVT), which describes how to extrapolate the occurrence of rare events outside the range of available data [1]. In the one-dimensional case, EVT provides remarkably simple models for the distribution of the maximum of an infinite number of independent and identically distributed (i.i.d) random variables.
Task-Aware Neural Architecture Search
Le, Cat P., Soltani, Mohammadreza, Ravier, Robert, Tarokh, Vahid
The design of handcrafted neural networks requires a lot of time and resources. Recent techniques in Neural Architecture Search (NAS) have proven to be competitive or better than traditional handcrafted design, although they require domain knowledge and have generally used limited search spaces. In this paper, we propose a novel framework for neural architecture search, utilizing a dictionary of models of base tasks and the similarity between the target task and the atoms of the dictionary; hence, generating an adaptive search space based on the base models of the dictionary. By introducing a gradient-based search algorithm, we can evaluate and discover the best architecture in the search space without fully training the networks. The experimental results show the efficacy of our proposed task-aware approach.
Fisher Auto-Encoders
Elkhalil, Khalil, Hasan, Ali, Ding, Jie, Farsiu, Sina, Tarokh, Vahid
It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs.
Identifying Latent Stochastic Differential Equations with Variational Auto-Encoders
Hasan, Ali, Pereira, Joรฃo M., Farsiu, Sina, Tarokh, Vahid
Variational auto-encoders (VAEs) are a widely used tool to learn lower-dimensional latent representations of high-dimensional data. However, the learned latent representations often lack interpretability, and it is challenging to extract relevant information from the representation of the dataset in the latent space. In particular, when the high-dimensional data is governed by unknown and lower-dimensional dynamics, arising, for instance, from unknown physical or biological interactions, the latent space representation often fails to bring insight on these dynamics. We propose a VAE-based framework for recovering latent dynamics governed by stochastic differential equations (SDEs). Our motivation for using SDEs is that they are already often used to model physical and biological phenomena, to study financial markets, and their properties have been extensively studied in the fields of probability and statistics. We believe this method can be useful in describing trajectories of high dimensional data with underlying physical or biological dynamics, with applications such as video data, longitudinal medical data or gene regulatory dynamics.
GeoStat Representations of Time Series for Fast Classification
Ravier, Robert J., Soltani, Mohammadreza, Simรตes, Miguel, Garagic, Denis, Tarokh, Vahid
Recent advances in time series classification have largely focused on methods that either employ deep learning or utilize other machine learning models for feature extraction. Though successful, their power often comes at the requirement of computational complexity. In this paper, we introduce GeoStat representations for time series. GeoStat representations are based off of a generalization of recent methods for trajectory classification, and summarize the information of a time series in terms of comprehensive statistics of (possibly windowed) distributions of easy to compute differential geometric quantities, requiring no dynamic time warping. The features used are intuitive and require minimal parameter tuning. We perform an exhaustive evaluation of GeoStat on a number of real datasets, showing that simple KNN and SVM classifiers trained on these representations exhibit surprising performance relative to modern single model methods requiring significant computational power, achieving state of the art results in many cases. In particular, we show that this methodology achieves good performance on a challenging dataset involving the classification of fishing vessels, where our methods achieve good performance relative to the state of the art despite only having access to approximately two percent of the dataset used in training and evaluating this state of the art.
HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients
Diao, Enmao, Ding, Jie, Tarokh, Vahid
Federated Learning (FL) is a method of training machine learning models on private data distributed over a large number of possibly heterogeneous clients such as mobile phones and IoT devices. In this work, we propose a new federated learning framework named HeteroFL to address heterogeneous clients equipped with very different computation and communication capabilities. Our solution can enable the training of heterogeneous local models with varying computation complexities and still produce a single global inference model. For the first time, our method challenges the underlying assumption of existing work that local models have to share the same architecture as the global model. We demonstrate several strategies to enhance FL training and conduct extensive empirical evaluations, including five computation complexity levels of three model architecture on three datasets. We show that adaptively distributing subnetworks according to clients' capabilities is both computation and communication efficient. Mobile devices and the Internet of Things (IoT) devices are becoming the primary computing resource for billions of users worldwide (Lim et al., 2020).
Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows
Cannella, Chris, Soltani, Mohammadreza, Tarokh, Vahid
We introduce Projected Latent Markov Chain Monte Carlo (PL-MCMC), a technique for sampling from the high-dimensional conditional distributions learned by a normalizing flow. We prove that a Metropolis-Hastings implementation of PL-MCMC asymptotically samples from the exact conditional distributions associated with a normalizing flow. As a conditional sampling method, PL-MCMC enables Monte Carlo Expectation Maximization (MC-EM) training of normalizing flows from incomplete data. Through experimental tests applying normalizing flows to missing data tasks for a variety of data sets, we demonstrate the efficacy of PL-MCMC for conditional sampling from normalizing flows. Conditional sampling from modeled joint probability distributions offers a statistical framework for approaching tasks involving missing and incomplete data. Deep generative models have demonstrated an exceptional capability for approximating the distributions governing complex data. Brief analysis illustrates a fundamental guarantee for generative models: the inaccuracy (i.e. Quite often, otherwise well trained generative models possess a capability for conditional inference that is regrettably locked away from our access. Normalizing flow architectures like RealNVP (Dinh et al., 2014) and GLOW (Kingma & Dhariwal, 2018) have demonstrated accurate and expressive generative performance and showing great promise for application to missing data tasks. Additionally, by enabling the calculation of exact likelihoods, normalizing flows offer convenient mathematical properties for approaching exact conditional sampling. We are therefore motivated to develop techniques for sampling from the exact conditional distributions known by normalizing flows. In this paper, we propose Projected Latent Markov Chain Monte Carlo (PL-MCMC), a conditional sampling technique that takes advantage of the convenient mathematical structure of normalizing flows by defining a Markov Chain within a flow's latent space and accepting proposed transitions based on the likelihood of the resulting imputation.
Model Linkage Selection for Cooperative Learning
Zhou, Jiaying, Ding, Jie, Tan, Kean Ming, Tarokh, Vahid
Rapid developments in data collecting devices and computation platforms produce an emerging number of learners and data modalities in many scientific domains. We consider the setting in which each learner holds a pair of parametric statistical model and a specific data source, with the goal of integrating information across a set of learners to enhance the prediction accuracy of a specific learner. One natural way to integrate information is to build a joint model across a set of learners that shares common parameters of interest. However, the parameter sharing patterns across a set of learners are not known a priori. Misspecifying the parameter sharing patterns and the parametric statistical model for each learner yields a biased estimator and degrades the prediction accuracy of the joint model. In this paper, we propose a novel framework for integrating information across a set of learners that is robust against model misspecification and misspecified parameter sharing patterns. The main crux is to sequentially incorporates additional learners that can enhance the prediction accuracy of an existing joint model based on a user-specified parameter sharing patterns across a set of learners, starting from a model with one learner. Theoretically, we show that the proposed method can data-adaptively select the correct parameter sharing patterns based on a user-specified parameter sharing patterns, and thus enhances the prediction accuracy of a learner. Extensive numerical studies are performed to evaluate the performance of the proposed method.
On Optimal Generalizability in Parametric Learning
Beirami, Ahmad, Razaviyayn, Meisam, Shahrampour, Shahin, Tarokh, Vahid
We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the out-of-sample performance. A classical cross validation strategy is the leave-one-out cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity.
Deep Clustering of Compressed Variational Embeddings
Wu, Suya, Diao, Enmao, Ding, Jie, Tarokh, Vahid
ABSTRACT Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint V ariational Autoen-coders with Bernoulli mixture models (V AB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by V ariational Autoencoders (V AEs) and group data representations by Bernoulli mixture models (BMMs). Once jointly trained for compression and clustering, the model can be decomposed into two parts: a data vendor that encodes the raw data into compressed data, and a data consumer that classifies the received (compressed) data. To enable training using the gradient descent algorithm, we propose to use the Gumbel-Softmax distribution to resolve the infeasibility of the back-propagation algorithm when assessing categorical samples. Index T erms -- Clustering, V ariational Autoencoder (V AE), Bernoulli Mixture Model (BMM) 1. INTRODUCTION Clustering is a fundamental task with applications in medical imaging, social network analysis, bioinformatics, computer graphics, etc. Applying classical clustering methods directly to high dimensional data may be computational inefficient and suffer from instability.