Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models

Saxena, Manan, Chen, Tinghua, Silverman, Justin D.

arXiv.org Machine Learning 

Many scientific fields collect longitudinal multivariate count data where the total number of counts is arbitrary (e.g., multinomial observations). These data are often called count compositional as the information in the data relates to the relative frequencies of the categories (Silverman et al., 2018). These data occur frequently in molecular biology (Espinoza et al., 2020), microbiome studies (Silverman et al., 2018; Joseph et al., 2020; Äijö et al., 2018), natural language processing (Linderman et al., 2015), biomedicine (Fokianos and Kedem, 2003), and social sciences (Cargnoni et al., 1997). Although the counting process used to collect these data is often modeled as multinomial, other sources of noise in the system being studied often lead to extra-multinomial variation. While some account for this extra-multinomial variability with multinomial-Dirichlet models (Mosimann, 1962), multinomial logistic-normal models are often superior, as they can account for both positive and negative covariation between multinomial categories (Aitchison and Shen, 1980; Cargnoni et al., 1997; Joseph et al., 2020; Silverman et al., 2018). Moreover, under suitable transformation (i.e., link function), the logistic-normal is multivariate Gaussian.