PEMS: Pre-trained Epidemic Time-series Models

Kamarthi, Harshavardhan, Prakash, B. Aditya

arXiv.org Artificial Intelligence 

Providing accurate and reliable predictions about the future of an epidemic is an important problem for enabling informed public health decisions. Recent works have shown that leveraging data-driven solutions that utilize advances in deep learning methods to learn from past data of an epidemic often outperform traditional mechanistic models. However, in many cases, the past data is sparse and may not sufficiently capture the underlying dynamics. While there exists a large amount of data from past epidemics, leveraging prior knowledge from time-series data of other diseases is a non-trivial challenge. Motivated by the success of pre-trained models in language and vision tasks, we tackle the problem of pre-training epidemic time-series models to learn from multiple datasets from different diseases and epidemics. We tackle various important challenges specific to pretraining for epidemic time-series such as dealing with heterogeneous dynamics and efficiently capturing useful patterns from multiple epidemic datasets by carefully designing the SSL tasks to learn important priors about the epidemic dynamics that can be leveraged for fine-tuning to multiple downstream tasks. The resultant PEM outperforms previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion including the novel Covid-19 pandemic unseen in pre-trained data with better efficiency using smaller fraction of datasets. Predicting the trends of an ongoing epidemic is an important public health problem that influences real-time decision-making affecting millions of people. Forecasting of time series of important epidemic indicators is a well-studied challenging problem (Rodríguez et al., 2022b; Chakraborty et al., 2018). Availability of traditional as well as novel datasets such as testing records, social media, etc. that capture multiple facets of the epidemic as well as advances in machine learning and deep learning in particular have enabled to build models that learn from these datasets and show promising results, often outperforming traditional mechanistic methods (Cramer et al., 2021; Reich et al., 2019). Many public health and research initiatives collect data from various diseases over many decades at various spatial granularities in different geographies.