Goto

Collaborating Authors

 viral load


Roadmap for using large language models (LLMs) to accelerate cross-disciplinary research with an example from computational biology

arXiv.org Artificial Intelligence

Large language models (LLMs) are powerful artificial intelligence (AI) tools transforming how research is conducted. However, their use in research has been met with skepticism, due to concerns about hallucinations, biases and potential harms to research. These emphasize the importance of clearly understanding the strengths and weaknesses of LLMs to ensure their effective and responsible use. Here, we present a roadmap for integrating LLMs into cross-disciplinary research, where effective communication, knowledge transfer and collaboration across diverse fields are essential but often challenging. We examine the capabilities and limitations of LLMs and provide a detailed computational biology case study (on modeling HIV rebound dynamics) demonstrating how iterative interactions with an LLM (ChatGPT) can facilitate interdisciplinary collaboration and research. We argue that LLMs are best used as augmentative tools within a human-in-the-loop framework. Looking forward, we envisage that the responsible use of LLMs will enhance innovative cross-disciplinary research and substantially accelerate scientific discoveries.


Viral Load Inference in Non-Adaptive Pooled Testing

arXiv.org Machine Learning

Medical diagnostic testing can be made significantly more efficient using pooled testing protocols. These typically require a sparse infection signal and use either binary or real-valued entries of O(1). However, existing methods do not allow for inferring viral loads which span many orders of magnitude. We develop a message passing algorithm coupled with a PCR (Polymerase Chain Reaction) specific noise function to allow accurate inference of realistic viral load signals. This work is in the non-adaptive setting and could open the possibility of efficient screening where viral load determination is clinically important.


SICO: Simulation for Infection Control Operations

arXiv.org Artificial Intelligence

In response to the COVID-19 pandemic and the potential threat of future epidemics caused by novel viruses, we developed a flexible framework for modeling disease intervention effects. This tool is intended to aid decision makers at multiple levels as they compare possible responses to emerging epidemiological threats for optimal control and reduction of harm. The framework is specifically designed to be both scalable and modular, allowing it to model a variety of population levels, viruses, testing methods and strategies--including pooled testing--and intervention strategies. In this paper, we provide an overview of this framework and examine the impact of different intervention strategies and their impact on infection dynamics.


Leveraging Wastewater Monitoring for COVID-19 Forecasting in the US: a Deep Learning study

arXiv.org Artificial Intelligence

The outburst of COVID-19 in late 2019 was the start of a health crisis that shook the world and took millions of lives in the ensuing years. Many governments and health officials failed to arrest the rapid circulation of infection in their communities. The long incubation period and the large proportion of asymptomatic cases made COVID-19 particularly elusive to track. However, wastewater monitoring soon became a promising data source in addition to conventional indicators such as confirmed daily cases, hospitalizations, and deaths. Despite the consensus on the effectiveness of wastewater viral load data, there is a lack of methodological approaches that leverage viral load to improve COVID-19 forecasting. This paper proposes using deep learning to automatically discover the relationship between daily confirmed cases and viral load data. We trained one Deep Temporal Convolutional Networks (DeepTCN) and one Temporal Fusion Transformer (TFT) model to build a global forecasting model. We supplement the daily confirmed cases with viral loads and other socio-economic factors as covariates to the models. Our results suggest that TFT outperforms DeepTCN and learns a better association between viral load and daily cases. We demonstrated that equipping the models with the viral load improves their forecasting performance significantly. Moreover, viral load is shown to be the second most predictive input, following the containment and health index. Our results reveal the feasibility of training a location-agnostic deep-learning model to capture the dynamics of infection diffusion when wastewater viral load data is provided.


Here's Why Your Rapid Test Is Negative Even If You Have COVID-19

International Business Times

Rapid COVID-19 tests can generate false-negative results because they aren't that sensitive, according to a medical expert. Rapid COVID-19 tests, or antigen tests, appear positive if they detect a certain amount of coronavirus -- also known as viral load -- from a sample taken from a person's body, according to BuzzFeed News. Dr. Emily Landon, an infectious disease expert, said that the window when viral load is at its peak can vary from person to person and can range from three days to more than a week as people's systems clear the virus at their own pace. Due to this, it may either take some time for an infected person's result to turn positive or never appear positive if they miss this window or collect their test sample incorrectly, among other things, according to Landon, who is also an associate professor of medicine at the University of Chicago Medicine. "Rapid tests are definitely not like a pregnancy test where it's going to be positive as long as it's been a few weeks after someone missed a period. It's only going to pick it up when you're at peak infectiousness, and they're almost never false positive," the doctor explained.


Predicting COVID-19 Incidences from Patients' Viral Load using Deep-Learning

#artificialintelligence

The transmission of the contagious COVID-19 is known to be highly dependent on individual viral dynamics. Since the cycle threshold (Ct) is the only semi-quantitative viral measurement that could reflect infectivity, we utilized Ct values to forecast COVID-19 incidences. Our COVID-19 cohort (n 9531), retrieved from a single representative cross-sectional virology test center in Lebanon, revealed that low daily mean Ct values are followed by an increase in the number of national positive COVID-19 cases. A subset of the data was used to develop a deep neural network model, tune its hyperparameters, and optimize the weights for minimal mean square error of prediction. The final model's accuracy is reported by comparing its predictions with an unseen dataset.


Estimating epidemiologic dynamics from cross-sectional viral load distributions

Science

During the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, polymerase chain reaction (PCR) tests were generally reported only as binary positive or negative outcomes. However, these test results contain a great deal more information than that. As viral load declines exponentially, the PCR cycle threshold (Ct) increases linearly. Hay et al. developed an approach for extracting epidemiological information out of the Ct values obtained from PCR tests used in surveillance for a variety of settings (see the Perspective by Lopman and McQuade). Although there are challenges to relying on single Ct values for individual-level decision-making, even a limited aggregation of data from a population can inform on the trajectory of the pandemic. Therefore, across a population, an increase in aggregated Ct values indicates that a decline in cases is occurring. Science , abh0635, this issue p. [eabh0635][1]; see also abj4185, p. [280][2] ### INTRODUCTION Current approaches to epidemic monitoring rely on case counts, test positivity rates, and reported deaths or hospitalizations. These metrics, however, provide a limited and often biased picture as a result of testing constraints, unrepresentative sampling, and reporting delays. Random cross-sectional virologic surveys can overcome some of these biases by providing snapshots of infection prevalence but currently offer little information on the epidemic trajectory without sampling across multiple time points. ### RATIONALE We develop a new method that uses information inherent in cycle threshold (Ct) values from reverse transcription quantitative polymerase chain reaction (RT-qPCR) tests to robustly estimate the epidemic trajectory from multiple or even a single cross section of positive samples. Ct values are related to viral loads, which depend on the time since infection; Ct values are generally lower when the time between infection and sample collection is short. Despite variation across individuals, samples, and testing platforms, Ct values provide a probabilistic measure of time since infection. We find that the distribution of Ct values across positive specimens at a single time point reflects the epidemic trajectory: A growing epidemic will necessarily have a high proportion of recently infected individuals with high viral loads, whereas a declining epidemic will have more individuals with older infections and thus lower viral loads. Because of these changing proportions, the epidemic trajectory or growth rate should be inferable from the distribution of Ct values collected in a single cross section, and multiple successive cross sections should enable identification of the longer-term incidence curve. Moreover, understanding the relationship between sample viral loads and epidemic dynamics provides additional insights into why viral loads from surveillance testing may appear higher for emerging viruses or variants and lower for outbreaks that are slowing, even absent changes in individual-level viral kinetics. ### RESULTS Using a mathematical model for population-level viral load distributions calibrated to known features of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral load kinetics, we show that the median and skewness of Ct values in a random sample change over the course of an epidemic. By formalizing this relationship, we demonstrate that Ct values from a single random cross section of virologic testing can estimate the time-varying reproductive number of the virus in a population, which we validate using data collected from comprehensive SARS-CoV-2 testing in long-term care facilities. Using a more flexible approach to modeling infection incidence, we also develop a method that can reliably estimate the epidemic trajectory in even more-complex populations, where interventions may be implemented and relaxed over time. This method performed well in estimating the epidemic trajectory in the state of Massachusetts using routine hospital admissions RT-qPCR testing dataโ€”accurately replicating estimates from other sources for the entire state. ### CONCLUSION This work provides a new method for estimating the epidemic growth rate and a framework for robust epidemic monitoring using RT-qPCR Ct values that are often simply discarded. By deploying single or repeated (but small) random surveillance samples and making the best use of the semiquantitative testing data, we can estimate epidemic trajectories in real time and avoid biases arising from nonrandom samples or changes in testing practices over time. Understanding the relationship between population-level viral loads and the state of an epidemic reveals important implications and opportunities for interpreting virologic surveillance data. It also highlights the need for such surveillance, as these results show how to use it most informatively. ![Figure][3] Ct values reflect the epidemic trajectory and can be used to estimate incidence. ( A and B ) Whether an epidemic has rising or falling incidence will be reflected in the distribution of times since infection (A), which in turn affects the distribution of Ct values in a surveillance sample (B). ( C ) These values can be used to assess whether the epidemic is rising or falling and estimate the incidence curve. Estimating an epidemicโ€™s trajectory is crucial for developing public health responses to infectious diseases, but case data used for such estimation are confounded by variable testing practices. We show that the population distribution of viral loads observed under random or symptom-based surveillanceโ€”in the form of cycle threshold (Ct) values obtained from reverse transcription quantitative polymerase chain reaction testingโ€”changes during an epidemic. Thus, Ct values from even limited numbers of random samples can provide improved estimates of an epidemicโ€™s trajectory. Combining data from multiple such samples improves the precision and robustness of this estimation. We apply our methods to Ct values from surveillance conducted during the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic in a variety of settings and offer alternative approaches for real-time estimates of epidemic trajectories for outbreak management and response. [1]: /lookup/doi/10.1126/science.abh0635 [2]: /lookup/doi/10.1126/science.abj4185 [3]: pending:yes


Estimating infectiousness throughout SARS-CoV-2 infection course

Science

The role that individuals with asymptomatic or mildly symptomatic severe acute respiratory syndrome coronavirus 2 have in transmission of the virus is not well understood. Jones et al. investigated viral load in patients, comparing those showing few, if any, symptoms with hospitalized cases. Approximately 400,000 individuals, mostly from Berlin, were tested from February 2020 to March 2021 and about 6% tested positive. Of the 25,381 positive subjects, about 8% showed very high viral loads. People became infectious within 2 days of infection, and in hospitalized individuals, about 4 days elapsed from the start of virus shedding to the time of peak viral load, which occurred 1 to 3 days before the onset of symptoms. Overall, viral load was highly variable, but was about 10-fold higher in persons infected with the B.1.1.7 variant. Children had slightly lower viral loads than adults, although this difference may not be clinically significant. Science , abi5273, this issue p. [eabi5273][1] ### INTRODUCTION Although post facto studies have revealed the importance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission from presymptomatic, asymptomatic, and mildly symptomatic (PAMS) cases, the virological basis of their infectiousness remains largely unquantified. The reasons for the rapid spread of variant lineages of concern, such as B.1.1.7, have yet to be fully determined. ### RATIONALE Viral load (viral RNA concentration) in patient samples and the rate of isolation success of virus from clinical specimens in cell culture are the clinical parameters most directly relevant to infectiousness and hence to transmission. To increase our understanding of the infectiousness of SARS-CoV-2, especially in PAMS cases and those infected with the B.1.1.7 variant, we analyzed viral load data from 25,381 German cases, including 9519 hospitalized patients, 6110 PAMS cases from walk-in test centers, 1533 B.1.1.7 variant infections, and the viral load time series of 4434 (mainly hospitalized) patients. Viral load results were then combined with estimated cell culture isolation probabilities, producing a clinical proxy estimate of infectiousness. ### RESULTS PAMS subjects had, at the first positive test, viral loads and estimated infectiousness only slightly less than hospitalized patients. Similarly, children were found to have mean viral loads only slightly lower (0.5 log10 unitsor less) than those of adults and ~78% of the adult peak cell culture isolation probability. Eight percent of first-positive viral loads were 109 copies per swab or higher, across a wide age range (mean 37.6 years, standard deviation 13.4 years), representing a likely highly infectious minority, one-third of whom were PAMS. Relative to non-B.1.1.7 cases, patients with the B.1.1.7 variant had viral loads that were higher by a factor of 10 and estimated cell culture infectivity that was higher by a factor of 2.6. Similar ranges of viral loads from B.1.1.7 and B.1.177 samples were shown to be capable of causing infection in Caco-2 cell culture. A time-course analysis estimates that a peak viral load of 108.1 copies per swab is reached 4.3 days after onset of shedding and shows that, across the course of infection, hospitalized patients have slightly higher viral loads than nonhospitalized cases, who in turn have viral loads slightly higher than PAMS cases. Higher viral loads are observed in first-positive tests of PAMS subjects, likely as a result of systematic earlier testing. Mean culture isolation probability declines to 0.5 at 5 days after peak viral load and to 0.3 at 10 days after peak viral load. We estimate a rate of viral load decline of 0.17 log10 units per day, which, combined with reported estimates of incubation time and time to loss of successful cell culture isolation, suggests that viral load peaks 1 to 3 days before onset of symptoms (in symptomatic cases). ### CONCLUSION PAMS subjects who test positive at walk-in test centers can be expected to be approximately as infectious as hospitalized patients. The level of expected infectious viral shedding of PAMS people is of high importance because they are circulating in the community at the time of detection of infection. Although viral load and cell culture infectivity cannot be translated directly to transmission probability, it is likely that the rapid spread of the B.1.1.7 variant is partly attributable to higher viral load in these cases. Easily measured virological parameters can be used, for example, to estimate transmission risk from different groups (by age, gender, clinical status, etc.), to quantify variance, to show differences in virus variants, to highlight and quantify overdispersion, and to inform quarantine, containment, and elimination strategies. ![Figure][2] Viral load and cell culture infectivity in 25,381 SARS-CoV-2 infections. ( A ) Viral loads in presymptomatic, asymptomatic, and mildly symptomatic cases (PAMS; red), hospitalized patients (blue), and other subjects (black). ( B ) Expected first-positive viral load and cell culture isolation probability, colored as in (A). ( C ) Temporal estimation with lines representing patients, colored as in (A). ( D ) As in (C), but colored by age. Two elementary parameters for quantifying viral infection and shedding are viral load and whether samples yield a replicating virus isolate in cell culture. We examined 25,381 cases of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Germany, including 6110 from test centers attended by presymptomatic, asymptomatic, and mildly symptomatic (PAMS) subjects, 9519 who were hospitalized, and 1533 B.1.1.7 lineage infections. The viral load of the youngest subjects was lower than that of the older subjects by 0.5 (or fewer) log10 units, and they displayed an estimated ~78% of the peak cell culture replication probability; in part this was due to smaller swab sizes and unlikely to be clinically relevant. Viral loads above 109 copies per swab were found in 8% of subjects, one-third of whom were PAMS, with a mean age of 37.6 years. We estimate 4.3 days from onset of shedding to peak viral load (108.1 RNA copies per swab) and peak cell culture isolation probability (0.75). B.1.1.7 subjects had mean log10 viral load 1.05 higher than that of non-B.1.1.7 subjects, and the estimated cell culture replication probability of B.1.1.7 subjects was higher by a factor of 2.6. [1]: /lookup/doi/10.1126/science.abi5273 [2]: pending:yes


Non-parametric Bayesian Causal Modeling of the SARS-CoV-2 Viral Load Distribution vs. Patient's Age

arXiv.org Machine Learning

The viral load of patients infected with SARS-CoV-2 varies on logarithmic scales and possibly with age. Controversial claims have been made in the literature regarding whether the viral load distribution actually depends on the age of the patients. Such a dependence would have implications for the COVID-19 spreading mechanism, the age-dependent immune system reaction, and thus for policymaking. We hereby develop a method to analyze viral-load distribution data as a function of the patients' age within a flexible, non-parametric, hierarchical, Bayesian, and causal model. This method can be applied to other contexts as well, and for this purpose, it is made freely available. The developed reconstruction method also allows testing for bias in the data. This could be due to, e.g., bias in patient-testing and data collection or systematic errors in the measurement of the viral load. We perform these tests by calculating the Bayesian evidence for each implied possible causal direction. When applying these tests to publicly available age and SARS-CoV-2 viral load data, we find a statistically significant increase in the viral load with age, but only for one of the two analyzed datasets. If we consider this dataset, and based on the current understanding of viral load's impact on patients' infectivity, we expect a non-negligible difference in the infectivity of different age groups. This difference is nonetheless too small to justify considering any age group as noninfectious.


SARS-CoV-2 within-host diversity and transmission

Science

A year into the severe acute respiratory syndrome coronavirus 2 pandemic, we are experiencing waves of new variants emerging. Some of these variants have worrying functional implications, such as increased transmissibility or antibody treatment escape. Lythgoe et al. have undertaken in-depth sequencing of more than 1000 hospital patients' isolates to find out how the virus is mutating within individuals. Overall, there seem to be consistent and reproducible patterns of within-host virus diversity. The authors observed only one or two variants in most samples, but a few carried many variants. Although the evidence indicates strong purifying selection, including in the spike protein responsible for viral entry, the authors also saw evidence for transmission clusters associated with households and other possible superspreader events. After transmission, most variants fizzled out, but occasionally some initiated ongoing transmission and wider dissemination. Science , this issue p. [eabg0821][1] ### INTRODUCTION Genome sequencing at an unprecedented scale during the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic is helping to track spread of the virus and to identify new variants. Most of this work considers a single consensus sequence for each infected person. Here, we looked beneath the consensus to analyze genetic variation within viral populations making up an infection and studied the fate of within-host mutations when an infection is transmitted to a new individual. Within - host diversity offers the means to help confirm direct transmission and identify new variants of concern. ### RATIONALE We sequenced 1313 SARS-CoV-2 samples from the first wave of infection in the United Kingdom. We characterized within-host diversity and dynamics in the context of transmission and ongoing viral evolution. ### RESULTS Within-host diversity can be described by the number of intrahost single nucleotide variants (iSNVs) occurring above a given minor allele frequency (MAF) threshold. We found that in lower-viral-load samples, stochastic sampling effects resulted in a higher variance in MAFs, leading to more iSNVs being detected at any threshold. Based on a subset of 27 pairs of high-viral-load replicate RNA samples (>50,000 uniquely mapped veSEQ reads, corresponding to a cycle threshold of ~22), iSNVs with a minimum 3% MAF were highly reproducible. Comparing samples from two time points from 41 individuals, taken on average 6 days apart (interquartile ratio 2 to 10), we observed a dynamic process of iSNV generation and loss. Comparing iSNVs among 14 household contact pairs, we estimated transmission bottleneck sizes of one to eight viruses. Consensus differences between individuals in the same household, where sample depth allowed iSNV detection, were explained by the presence of an iSNV at the same site in the paired individual, consistent with direct transmission leading to fixation. We next focused on a set of 563 high-confidence iSNV sites that were variant in at least one high-viral-load sample (>50,000 uniquely mapped); low-confidence iSNVs unlikely to represent genomic diversity were excluded. Within-host diversity was limited in high-viral-load samples (mean 1.4 iSNVs per sample). Two exceptions, each with >14 iSNVs, showed variant frequencies consistent with coinfection or contamination. Overall, we estimated that 1 to 2% of samples in our dataset were coinfected and/or contaminated. Additionally, one sample was coinfected with another coronavirus (OC43), with no detectable impact on diversity. The ratio of nonsynonymous to synonymous ( dN/dS ) iSNVs was consistent with within-host purifying selection when estimated across the whole genome [ dN/dS = 0.55, 95% confidence interval (95% CI) = 0.49 to 0.61] and for the Spike gene ( dN/dS = 0.60, 95% CI = 0.45 to 0.82). Nevertheless, we observed Spike variants in multiple samples that have been shown to increase viral infectivity (L5F) or resistance to antibodies (G446V and A879V). We observed a strong association between high-confidence iSNVs and a consensus change on the phylogeny (153 cases), consistent with fixation after transmission or de novo mutations reaching consensus. Shared variants that never reached consensus (261 cases) were not phylogenetically associated. ### CONCLUSION Using robust methods to call within-host variants, we uncovered a consistent pattern of low within-host diversity, purifying selection, and narrow transmission bottlenecks. Within-host emergence of vaccine and therapeutic escape mutations is likely to be relatively rare, at least during early infection, when viral loads are high, but the observation of immune-escape variants in high-viral-load samples underlines the need for continued vigilance. ![Figure][2] Diagram showing low SARS-CoV-2 within-host genetic diversity and narrow transmission bottleneck. Individuals with high viral load typically have few, if any, within-host variants. Narrow transmission bottlenecks mean that the major variant in the source individual was typically transmitted and the minor variants lost. Occasionally, the minor variant was transmitted, leading to a consensus change, or multiple variants were transmitted, resulting in a mixed infection. Credit: FontAwesome, licensed under CC BY 4.0. Extensive global sampling and sequencing of the pandemic virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have enabled researchers to monitor its spread and to identify concerning new variants. Two important determinants of variant spread are how frequently they arise within individuals and how likely they are to be transmitted. To characterize within-host diversity and transmission, we deep-sequenced 1313 clinical samples from the United Kingdom. SARS-CoV-2 infections are characterized by low levels of within-host diversity when viral loads are high and by a narrow bottleneck at transmission. Most variants are either lost or occasionally fixed at the point of transmission, with minimal persistence of shared diversity, patterns that are readily observable on the phylogenetic tree. Our results suggest that transmission-enhancing and/or immune-escape SARS-CoV-2 variants are likely to arise infrequently but could spread rapidly if successfully transmitted. [1]: /lookup/doi/10.1126/science.abg0821 [2]: pending:yes