Collaborating Authors


Socioeconomic status determines COVID-19 incidence and related mortality in Santiago, Chile


Santiago, Chile, is a highly segregated city with distinct zones of affluence and deprivation. This setting offers a window on how social factors propel the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic in an economically vulnerable society with high levels of income inequality. Mena et al. analyzed incidence and mortality attributed to SARS-CoV-2 to understand spatial variations in disease burden. Infection fatality rates were higher in lower-income municipalities because of comorbidities and lack of access to health care. Disparities between municipalities in the quality of their health care delivery system became apparent in testing delays and capacity. These indicators explain a large part of the variation in COVID-19 underreporting and deaths and show that these inequalities disproportionately affected younger people. Science , abg5298, this issue p. [eabg5298][1] ### INTRODUCTION The COVID-19 crisis has exposed major inequalities between communities. Understanding the societal risk factors that make some groups particularly vulnerable is essential to ensure more effective interventions for this and future pandemics. Here, we focus on socioeconomic status as a risk factor. Although it is broadly understood that social and economic inequality has a negative impact on health outcomes, the mechanisms by which socioeconomic status affects disease outcomes remain unclear. These mechanisms can be mediated by a range of systemic structural factors, such as access to health care and economic safety nets. We address this gap by providing an in-depth characterization of disease incidence and mortality and their dependence on demographic and socioeconomic strata in Santiago, a highly segregated city and the capital of Chile. ### RATIONALE Combining publicly available data sources, we conducted a comprehensive analysis of case incidence and mortality during the first wave of the pandemic. We correlated COVID-19 outcomes with behavioral and health care system factors while studying their interaction with age and socioeconomic status. To overcome the intrinsic biases of incomplete case count data, we used detailed mortality data. We developed a parsimonious Gaussian process model to study excess deaths and their uncertainty and reconstructed true incidence from the time series of deaths with a new regularized maximum likelihood deconvolution method. To estimate infection fatality rates by age and socioeconomic status, we implemented a hierarchical Bayesian model that adjusts for reporting biases while accounting for incompleteness in case information. ### RESULTS We find robust associations between COVID-19 outcomes and socioeconomic status, based on health and behavioral indicators. Specifically, we show in lower–socioeconomic status municipalities that testing was almost absent early in the pandemic and that human mobility was not reduced by lockdowns as much as it was in more affluent locations. Test positivity and testing delays were much higher in these locations, indicating an impaired capacity of the health care system to contain the spread of the epidemic. We also find that 73% more deaths than in a normal year were observed between May and July 2020, and municipalities at the lower end of the socioeconomic spectrum were hit the hardest, both in relation to COVID-19–attributed deaths and excess deaths. Finally, the socioeconomic gradient of the infection fatality rate appeared particularly steep for younger age groups, reflecting worse baseline health status and limited access to health care in municipalities with low socioeconomic status. ### CONCLUSION Together, these findings highlight the substantial consequences of socioeconomic and health care disparities in a highly segregated city and provide practical methodological approaches useful for characterizing the COVID-19 burden and mortality in other urban centers based on public data, even if reports are incomplete and biased. ![Figure][2] Effect of socioeconomic inequalities on COVID-19 outcomes. The map on the left shows the municipalities that were included in this study, colored by their socioeconomic status score. For the comparison between COVID-19 deaths and excess deaths (top right), COVID-19–confirmed deaths are shown in light green and COVID-19–attributed deaths in dark green. Excess deaths, shown in gray, correspond to the difference between observed and predicted deaths. Predicted deaths were estimated using a Gaussian process model. The shading indicates 95% credible intervals for the excess deaths. The infection fatality rates (bottom right) were inferred by implementing a hierarchical Bayesian model, with vertical lines representing credible intervals by age and socioeconomic status. The COVID-19 pandemic has affected cities particularly hard. Here, we provide an in-depth characterization of disease incidence and mortality and their dependence on demographic and socioeconomic strata in Santiago, a highly segregated city and the capital of Chile. Our analyses show a strong association between socioeconomic status and both COVID-19 outcomes and public health capacity. People living in municipalities with low socioeconomic status did not reduce their mobility during lockdowns as much as those in more affluent municipalities. Testing volumes may have been insufficient early in the pandemic in those places, and both test positivity rates and testing delays were much higher. We find a strong association between socioeconomic status and mortality, measured by either COVID-19–attributed deaths or excess deaths. Finally, we show that infection fatality rates in young people are higher in low-income municipalities. Together, these results highlight the critical consequences of socioeconomic inequalities on health outcomes. [1]: /lookup/doi/10.1126/science.abg5298 [2]: pending:yes

Bayesian Kernelised Test of (In)dependence with Mixed-type Variables Machine Learning

A fundamental task in AI is to assess (in)dependence between mixed-type variables (text, image, sound). We propose a Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model. The new measure of (in)dependence allows us to answer some fundamental questions: Based on data, are (mixed-type) variables independent? How likely is dependence/independence to hold? How high is the probability that two mixed-type variables are more than just weakly dependent? We theoretically show the properties of the approach, as well as algorithms for fast computation with it. We empirically demonstrate the effectiveness of the proposed method by analysing its performance and by comparing it with other frequentist and Bayesian approaches on a range of datasets and tasks with mixed-type variables.

A Human-Centered Interpretability Framework Based on Weight of Evidence Artificial Intelligence

In this paper, we take a human-centered approach to interpretable machine learning. First, drawing inspiration from the study of explanation in philosophy, cognitive science, and the social sciences, we propose a list of design principles for machine-generated explanations that are meaningful to humans. Using the concept of weight of evidence from information theory, we develop a method for producing explanations that adhere to these principles. We show that this method can be adapted to handle high-dimensional, multi-class settings, yielding a flexible meta-algorithm for generating explanations. We demonstrate that these explanations can be estimated accurately from finite samples and are robust to small perturbations of the inputs. We also evaluate our method through a qualitative user study with machine learning practitioners, where we observe that the resulting explanations are usable despite some participants struggling with background concepts like prior class probabilities. Finally, we conclude by surfacing design implications for interpretability tools

Forecasting COVID-19 Counts At A Single Hospital: A Hierarchical Bayesian Approach Machine Learning

We consider the problem of forecasting the daily number of hospitalized COVID-19 patients at a single hospital site, in order to help administrators with logistics and planning. We develop several candidate hierarchical Bayesian models which directly capture the count nature of data via a generalized Poisson likelihood, model time-series dependencies via autoregressive and Gaussian process latent processes, and can share statistical strength across related sites. We demonstrate our approach on public datasets for 8 hospitals in Massachusetts, U.S.A. and 10 hospitals in the United Kingdom. Further prospective evaluation compares our approach favorably to baselines currently used by stakeholders at 3 related hospitals to forecast 2-week-ahead demand by rescaling state-level forecasts. The COVID-19 pandemic has created unprecedented demand for limited hospital resources across the globe. Emergency resource allocation decisions made by hospital administrators (such as planning additional personnel or provisioning beds and equipment) are crucial for achieving successful patient outcomes and avoiding overwhelmed capacity. However, at present hospitals often lack the ability to forecast what will be needed at their site in coming weeks. This may be especially true in under-resourced hospitals, due to constraints on funding, staff time and expertise, and other issues.

Inference for BART with Multinomial Outcomes Machine Learning

The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian distributed latent utilities. We introduce two new algorithms for fitting the MPBART and show that the theoretical mixing rates of our proposals are equal or superior to the existing algorithm in KD. Through simulations, we explore the robustness of the methods to the choice of reference level, imbalance in outcome frequencies, and the specifications of prior hyperparameters for the utility error term. The work is motivated by the application of generating posterior predictive distributions for mortality and engagement in care among HIV-positive patients based on electronic health records (EHRs) from the Academic Model Providing Access to Healthcare (AMPATH) in Kenya. In both the application and simulations, we observe better performance using our proposals as compared to KD in terms of MCMC convergence rate and posterior predictive accuracy.

Score Matched Conditional Exponential Families for Likelihood-Free Inference Machine Learning

To perform Bayesian inference for stochastic simulator models for which the likelihood is not accessible, Likelihood-Free Inference (LFI) relies on simulations from the model. Standard LFI methods can be split according to how these simulations are used: to build an explicit Surrogate Likelihood, or to accept/reject parameter values according to a measure of distance from the observations (Approximate Bayesian Computation (ABC)). In both cases, simulations are adaptively tailored to the value of the observation. Here, we generate parameter-simulation pairs from the model independently on the observation, and use them to learn a conditional exponential family likelihood approximation; to parametrize it, we use Neural Networks whose weights are tuned with Score Matching. With our likelihood approximation, we can employ MCMC for doubly intractable distributions to draw samples from the posterior for any number of observations without additional model simulations, with performance competitive to comparable approaches. Further, the sufficient statistics of the exponential family can be used as summaries in ABC, outperforming the state-of-the-art method in five different models with known likelihood. Finally, we apply our method to a challenging model from meteorology.

Hardware-accelerated Simulation-based Inference of Stochastic Epidemiology Models for COVID-19 Artificial Intelligence

Epidemiology models are central in understanding and controlling large scale pandemics. Several epidemiology models require simulation-based inference such as Approximate Bayesian Computation (ABC) to fit their parameters to observations. ABC inference is highly amenable to efficient hardware acceleration. In this work, we develop parallel ABC inference of a stochastic epidemiology model for COVID-19. The statistical inference framework is implemented and compared on Intel Xeon CPU, NVIDIA Tesla V100 GPU and the Graphcore Mk1 IPU, and the results are discussed in the context of their computational architectures. Results show that GPUs are 4x and IPUs are 30x faster than Xeon CPUs. Extensive performance analysis indicates that the difference between IPU and GPU can be attributed to higher communication bandwidth, closeness of memory to compute, and higher compute power in the IPU. The proposed framework scales across 16 IPUs, with scaling overhead not exceeding 8% for the experiments performed. We present an example of our framework in practice, performing inference on the epidemiology model across three countries, and giving a brief overview of the results.

Decision-Making Algorithms for Learning and Adaptation with Application to COVID-19 Data Machine Learning

This work focuses on the development of a new family of decision-making algorithms for adaptation and learning, which are specifically tailored to decision problems and are constructed by building up on first principles from decision theory. A key observation is that estimation and decision problems are structurally different and, therefore, algorithms that have proven successful for the former need not perform well when adjusted for decision problems. We propose a new scheme, referred to as BLLR (barrier log-likelihood ratio algorithm) and demonstrate its applicability to real-data from the COVID-19 pandemic in Italy. The results illustrate the ability of the design tool to track the different phases of the outbreak.

A New Inference algorithm of Dynamic Uncertain Causality Graph based on Conditional Sampling Method for Complex Cases Artificial Intelligence

Dynamic Uncertain Causality Graph(DUCG) is a recently proposed model for diagnoses of complex systems. It performs well for industry system such as nuclear power plants, chemical system and spacecrafts. However, the variable state combination explosion in some cases is still a problem that may result in inefficiency or even disability in DUCG inference. In the situation of clinical diagnoses, when a lot of intermediate causes are unknown while the downstream results are known in a DUCG graph, the combination explosion may appear during the inference computation. Monte Carlo sampling is a typical algorithm to solve this problem. However, we are facing the case that the occurrence rate of the case is very small, e.g. $10^{-20}$, which means a huge number of samplings are needed. This paper proposes a new scheme based on conditional stochastic simulation which obtains the final result from the expectation of the conditional probability in sampling loops instead of counting the sampling frequency, and thus overcomes the problem. As a result, the proposed algorithm requires much less time than the DUCG recursive inference algorithm presented earlier. Moreover, a simple analysis of convergence rate based on a designed example is given to show the advantage of the proposed method. % In addition, supports for logic gate, logic cycles, and parallelization, which exist in DUCG, are also addressed in this paper. The new algorithm reduces the time consumption a lot and performs 3 times faster than old one with 2.7% error ratio in a practical graph for Viral Hepatitis B.

On Bayesian sparse canonical correlation analysis via Rayleigh quotient framework Machine Learning

Canonical correlation analysis is a statistical technique -dating back at least to [1] - that is used to maximally correlate multiple datasets for joint analysis. The technique has become a fundamental tool in biomedical research where technological advances have led to a huge number of multi-omic datasets ([2]; [3]; [4]). Over the past two decades, limited sample sizes, growing dimensionality, and the search for meaningful biological interpretations, have led to the development of sparse canonical correlation analysis ([2]), where a sparsity assumption is imposed on the canonical correlation vectors. This work falls under the topic of the Bayesian estimation of sparse canonical corrlation vectors. Model-based approaches to canonical correlation analysis were developed in the mid 2000's (see e.g., [5]), and paved the way for a Bayesian treatment of canonical correlation analysis ([6];[7]) and sparse canonical correlation analysis ([8]). However an serious shortcoming of such a Bayesian treatment is that this approach naturally requires a complete specification of the joint distribution of the data, so as to specify the likelihood function. This requirement is a serious limitation in many applications, where the data generating process is poorly understood, for example, image data.