Scalable Modeling of Spatiotemporal Data using the Variational Autoencoder: an Application in Glaucoma
Berchuck, Samuel I., Medeiros, Felipe A., Mukherjee, Sayan
Submitted to the Annals of Applied Statistics SCALABLE MODELING OF SPATIOTEMPORAL DATA USING THE VARIATIONAL AUTOENCODER: AN APPLICATION IN GLAUCOMA By Samuel I. Berchuck, Felipe A. Medeiros and Sayan Mukherjee Duke University As big spatial data becomes increasingly prevalent, classical spatiotemporal (ST) methods often do not scale well. While methods have been developed to account for high-dimensional spatial objects, the setting where there are exceedingly large samples of spatial observations has had less attention. The variational autoencoder (V AE), an unsupervised generative model based on deep learning and approximate Bayesian inference, fills this void using a latent variable specification that is inferred jointly across the large number of samples. In this manuscript, we compare the performance of the V AE with a more classical ST method when analyzing longitudinal visual fields from a large cohort of patients in a prospective glaucoma study. Through simulation and a case study, we demonstrate that the V AE is a scalable method for analyzing ST data, when the goal is to obtain accurate predictions. R code to implement the V AE can be found on GitHub: https://github.com/berchuck/vaeST. 1. Introduction. As high-speed computing and medical imaging become increasingly inexpensive, massive amounts of data are generated that have to be analyzed and are often spatial in nature (Bearden and Thompson, 2017; Smith and Nichols, 2018). In the case of medical imaging, the number of patients that can be imaged has skyrocketed in recent years, allowing for studies that include images from many thousands of patients (Van Essen et al., 2013; Miller et al., 2016). The current spatial statistics literature focuses heavily on scalability in terms of the number of spatial locations (Banerjee, 2017), however largely ignores the setting where a joint model is needed for spatiotemporal (ST) data that are generated from a large cohort. Historically, learning an appropriate generating process in this setting was untenable, typically leading to simplifying assumptions, such as point-wise (PW) modeling of locations across time (Fitzke et al., 1996). In particular, generative models using deep learning have shown great promise in modeling complex distributions, p( x), for x x 1: M in some potentially high-dimensional space X . Sampling from X is often intractable, so instead generative modeling learns a distribution q (x) that can be sampled from and is close to p (x) (Doersch, 2016). As such, generative modeling can be viewed as an approximate method for performing inference in high-dimensional contexts, when there is an overwhelming availability of observations x . Generative modeling, and in particular the variational auto-encoder (V AE), are well-suited for modeling large cohorts of ST data, because they can characterize variability in a spatial data source through joint modeling (Kingma and Welling, 2013).
Aug-24-2019
- Country:
- North America > United States > California (0.28)
- Genre:
- Research Report > Observational Study (0.48)
- Industry: