Collaborating Authors


Facebook and NYU trained an AI to estimate COVID outcomes


COVID-19 has infected more than 23 million Americans and killed 386,000 of them to date, since the global pandemic began last March. Complicating the public health response is the fact that we still know so little about how the virus operates -- such as why some patients remain asymptomatic while it ravages others. Effectively allocating resources like ICU beds and ventilators becomes a Sisyphean task when doctors can only guess as to who might recover and who might be intubated within the next 96 hours. However a trio of new machine learning algorithms developed by Facebook's AI division (FAIR) in cooperation with NYU Langone Health can help predict patient outcomes up to four days in advance using just a patient's chest x-rays. The models can, respectively, predict patient deterioration based on either a single X-ray or a sequence as well as determine how much supplemental oxygen the patient will likely need.

Recent and forthcoming machine learning and AI seminars: January 2021 edition


This post contains a list of the AI-related seminars that are scheduled to take place between now and the end of February 2021. We've also listed recent past seminars that are available for you to watch. All events detailed here are free and open for anyone to attend virtually. This list includes forthcoming seminars scheduled to take place between 15 January and 28 February. Zero-shot (human-AI) coordination (in Hanabi) and ridge rider Speaker: Jakob Foerster (Facebook, University of Toronto & Vector Institute) Organised by: University College London Zoom link is here.

AI-Powered Text From This Program Could Fool the Government


In October 2019, Idaho proposed changing its Medicaid program. The state needed approval from the federal government, which solicited public feedback via But half came not from concerned citizens or even internet trolls. They were generated by artificial intelligence. And a study found that people could not distinguish the real comments from the fake ones.

AIs that read sentences can also spot virus mutations

MIT Technology Review

In a study published in Science today, Berger and her colleagues pull several of these strands together and use NLP to predict mutations that allow viruses to avoid being detected by antibodies in the human immune system, a process known as viral immune escape. The basic idea is that the interpretation of a virus by an immune system is analogous to the interpretation of a sentence by a human. "It's a neat paper, building off the momentum of previous work," says Ali Madani, a scientist at Salesforce, who is using NLP to predict protein sequences. Berger's team uses two different linguistic concepts: grammar and semantics (or meaning). The genetic or evolutionary fitness of a virus--characteristics such as how good it is at infecting a host--can be interpreted in terms of grammatical correctness.

The language of a virus


Uncovering connections between seemingly unrelated branches of science might accelerate research in one branch by using the methods developed in the other branch as stepping stones. On page 284 of this issue, Hie et al. ([ 1 ][1]) provide an elegant example of such unexpected connections. The authors have uncovered a parallel between the properties of a virus and its interpretation by the host immune system and the properties of a sentence in natural language and its interpretation by a human. By leveraging an extensive natural language processing (NLP) toolbox ([ 2 ][2], [ 3 ][3]) developed over the years, they have come up with a powerful new method for the identification of mutations that allow a virus to escape from recognition by neutralizing antibodies. In 1950, Alan Turing predicted that machines will eventually compete with men in “intellectual fields” and suggested that one possible way forward would be to build a machine that can be taught to understand and speak English ([ 4 ][4]). This was, and still is, an ambitious goal. It is clear that language grammar can provide a formal skeleton for building sentences, but how can machines be trained to infer the meanings? In natural language, there are many ways to express the same idea, and yet small changes in expression can often change the meaning. Linguistics developed a way of quantifying the similarity of meaning (semantics). Specifically, it was proposed that words that are used in the same context are likely to have similar meanings ([ 5 ][5], [ 6 ][6]). This distributional hypothesis became a key feature for the computational technique in NLP, known as word (semantic) embedding. The main idea is to characterize words as vectors that represent distributional properties in a large amount of language data and then embed these sparse, high-dimensional vectors into more manageable, low-dimensional space in a distance-preserving manner. By the distributional hypothesis, this technique should group words that have similar semantics together in the embedding space. Hie et al. proposed that viruses can also be thought to have a grammar and semantics. Intuitively, the grammar describes which sequences make specific viruses (or their parts). Biologically, a viral protein sequence should have all the properties needed to invade a host, multiply, and continue invading another host. Thus, in some way, the grammar represents the fitness of a virus. With enough data, current machine learning approaches can be used to learn this sequence-based fitness function. ![Figure][7] Predicting immune escape The constrained semantic change search algorithm obtains semantic embeddings of all mutated protein sequences using bidirectional long short-term memory (LSTM). The sequences are ranked according to the combined score of the semantic change (the distance of a mutation from the original sequence) and fitness (the probability that a mutation appears in viral sequences). GRAPHIC: V. ALTOUNIAN/SCIENCE But what would be the meaning (semantics) of a virus? Hie et al. suggested that the semantics of a virus should be defined in terms of its recognition by immune systems. Specifically, viruses with different semantics would require a different state of the immune system (for example, different antibodies) to be recognized. The authors hypothesized that semantic embeddings allow sequences that require different immune responses to be uncovered. In this context, words represent protein sequences (or protein fragments), and recognition of such protein fragments is the task performed by the immune system. To escape immune responses, viral genomes can become mutated so that the virus evolves to no longer be recognized by the immune system. However, a virus that acquires a mutation that compromises its function (and thus fitness) will not survive. Using the NLP analogy, immune escape will be achieved by the mutations that change the semantics of the virus while maintaining its grammaticality so that the virus will remain infectious but escape the immune system. On the basis of this idea, Hie et al. developed a new approach, called constrained semantic change search (CSCS). Computationally, the goal of CSCS is to identify mutations that confer high fitness and substantial semantic changes at the same time (see the figure). The immune escape scores are computed by combining the two quantities. The search algorithm builds on a powerful deep learning technique for language modeling, called long short-term memory (LSTM), to obtain semantic embeddings of all mutated sequences and rank the sequences according to their immune escape scores in the embedded space. The semantic changes correspond to the distance of the mutated sequences to the original sequence in the semantic embedding, and its “grammaticality” (or fitness) is estimated by the probability that the mutation appears in viral sequences. The immune escape scores can then be computed by simultaneously considering both the semantic distance and fitness probability. Hie et al. confirmed their hypothesis for the correspondence of grammaticality and semantics to fitness and immune response in three viral proteins: influenza A hemagglutinin (HA), HIV-1 envelope (Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike. For the analogy of semantics to immune response, they found that clusters of semantically similar viruses were in good correspondence with virus subtypes, host, or both, confirming that the language model can extract functional meanings from protein sequences. The clustering patterns also revealed interspecies transmissibility and antigenic similarity. The correspondence of grammaticality to fitness was assessed more directly by using deep mutational scans evaluated for replication fitness (for HA and Env) or binding (for Spike). The combined model was tested against experimentally verified mutations that allow for immue escape. Scoring each amino acid residue with CSCS, the authors uncovered viral protein regions that are significantly enriched with escape potential: the head of HA for influenza, the V1/V2 hypervariable regions for HIV Env, and the receptor-binding domain (RBD) and amino-terminal domain for SARS-CoV-2 Spike. The language of viral evolution and escape proposed by Hie et al. provides a powerful framework for predicting mutations that lead to viral escape. However, interesting questions remain. Further extending the natural language analogy, it is notable that individuals can interpret the same English sentence differently depending on their past experience and the fluency in the language. Similarly, immune response differs between individuals depending on factors such as past pathogenic exposures and overall “strength” of the immune system. It will be interesting to see whether the proposed approach can be adapted to provide a “personalized” view of the language of virus evolution. 1. [↵][8]1. B. Hie, 2. E. Zhong, 3. B. Berger, 4. B. Bryson , Science 371, 284 (2021). [OpenUrl][9][Abstract/FREE Full Text][10] 2. [↵][11]1. L. Yann, 2. Y. Bengio, 3. G. Hinton , Nature 521, 436 (2015). [OpenUrl][12][CrossRef][13][PubMed][14] 3. [↵][15]1. T. Young, 2. D. Hazarika, 3. S. Poria, 4. E. Cambria , IEEE Comput. Intell. Mag. 13, 55 (2018). [OpenUrl][16] 4. [↵][17]1. A. Turing , Mind LIX, 433 (1950). 5. [↵][18]1. Z. S. Harris , Word 10, 146 (1954). [OpenUrl][19][CrossRef][20][PubMed][21] 6. [↵][22]1. J. R. Firth , in Studies in Linguistic Analysis (1957), pp. 1–32. Acknowledgments: The authors are supported by the Intramural Research Programs of the National Library of Medicine at the National Institutes of Health, USA. [1]: #ref-1 [2]: #ref-2 [3]: #ref-3 [4]: #ref-4 [5]: #ref-5 [6]: #ref-6 [7]: pending:yes [8]: #xref-ref-1-1 "View reference 1 in text" [9]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DHie%26rft.auinit1%253DB.%26rft.volume%253D371%26rft.issue%253D6526%26rft.spage%253D284%26rft.epage%253D288%26rft.atitle%253DLearning%2Bthe%2Blanguage%2Bof%2Bviral%2Bevolution%2Band%2Bescape%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.abd7331%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [10]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzEvNjUyNi8yODQiO3M6NDoiYXRvbSI7czoyMjoiL3NjaS8zNzEvNjUyNi8yMzMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [11]: #xref-ref-2-1 "View reference 2 in text" [12]: {openurl}?query=rft.jtitle%253DNature%26rft.volume%253D521%26rft.spage%253D436%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fnature14539%26rft_id%253Dinfo%253Apmid%252F26017442%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [13]: /lookup/external-ref?access_num=10.1038/nature14539&link_type=DOI [14]: /lookup/external-ref?access_num=26017442&link_type=MED&atom=%2Fsci%2F371%2F6526%2F233.atom [15]: #xref-ref-3-1 "View reference 3 in text" [16]: {openurl}?query=rft.jtitle%253DIEEE%2BComput.%2BIntell.%2BMag.%26rft.volume%253D13%26rft.spage%253D55%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [17]: #xref-ref-4-1 "View reference 4 in text" [18]: #xref-ref-5-1 "View reference 5 in text" [19]: {openurl}?query=rft.jtitle%253DWord%26rft.volume%253D10%26rft.spage%253D146%26rft_id%253Dinfo%253Adoi%252F10.1080%252F00437956.1954.11659520%26rft_id%253Dinfo%253Apmid%252F32513867%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [20]: /lookup/external-ref?access_num=10.1080/00437956.1954.11659520&link_type=DOI [21]: /lookup/external-ref?access_num=32513867&link_type=MED&atom=%2Fsci%2F371%2F6526%2F233.atom [22]: #xref-ref-6-1 "View reference 6 in text"

Transmission heterogeneities, kinetics, and controllability of SARS-CoV-2


A minority of people infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmit most infections. How does this happen? Sun et al. reconstructed transmission in Hunan, China, up to April 2020. Such detailed data can be used to separate out the relative contribution of transmission control measures aimed at isolating individuals relative to population-level distancing measures. The authors found that most of the secondary transmissions could be traced back to a minority of infected individuals, and well over half of transmission occurred in the presymptomatic phase. Furthermore, the duration of exposure to an infected person combined with closeness and number of household contacts constituted the greatest risks for transmission, particularly when lockdown conditions prevailed. These findings could help in the design of infection control policies that have the potential to minimize both virus transmission and economic strain. Science , this issue p. [eabe2424][1] ### INTRODUCTION The role of transmission heterogeneities in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) dynamics remains unclear, particularly those heterogeneities driven by demography, behavior, and interventions. To understand individual heterogeneities and their effect on disease control, we analyze detailed contact-tracing data from Hunan, a province in China adjacent to Hubei and one of the first regions to experience a SARS-CoV-2 outbreak in January to March 2020. The Hunan outbreak was swiftly brought under control by March 2020 through a combination of nonpharmaceutical interventions including population-level mobility restriction (i.e., lockdown), traveler screening, case isolation, contact tracing, and quarantine. In parallel, highly detailed epidemiological information on SARS-CoV-2–infected individuals and their close contacts was collected by the Hunan Provincial Center for Disease Control and Prevention. ### RATIONALE Contact-tracing data provide information to reconstruct transmission chains and understand outbreak dynamics. These data can in turn generate valuable intelligence on key epidemiological parameters and risk factors for transmission, which paves the way for more-targeted and cost-effective interventions. ### RESULTS On the basis of epidemiological information and exposure diaries on 1178 SARS-CoV-2–infected individuals and their 15,648 close contacts, we developed a series of statistical and computational models to stochastically reconstruct transmission chains, identify risk factors for transmission, and infer the infectiousness profile over the course of a typical infection. We observe overdispersion in the distribution of secondary infections, with 80% of secondary cases traced back to 15% of infections, which indicates substantial transmission heterogeneities. We find that SARS-CoV-2 transmission risk scales positively with the duration of exposure and the closeness of social interactions, with the highest per-contact risk estimated in the household. Lockdown interventions increase transmission risk in families and households, whereas the timely isolation of infected individuals reduces risk across all types of contacts. There is a gradient of increasing susceptibility with age but no significant difference in infectivity by age or clinical severity. Early isolation of SARS-CoV-2–infected individuals drastically alters transmission kinetics, leading to shorter generation and serial intervals and a higher fraction of presymptomatic transmission. After adjusting for the censoring effects of isolation, we find that the infectiousness profile of a typical SARS-CoV-2 patient peaks just before symptom onset, with 53% of transmission occurring in the presymptomatic phase in an uncontrolled setting. We then use these results to evaluate the effectiveness of individual-based strategies (case isolation and contact quarantine) both alone and in combination with population-level contact reductions. We find that a plausible parameter space for SARS-CoV-2 control is restricted to scenarios where interventions are synergistically combined, owing to the particular transmission kinetics of this virus. ### CONCLUSION There is considerable heterogeneity in SARS-CoV-2 transmission owing to individual differences in biology and contacts that is modulated by the effects of interventions. We estimate that about half of secondary transmission events occur in the presymptomatic phase of a primary case in uncontrolled outbreaks. Achieving epidemic control requires that isolation and contact-tracing interventions are layered with population-level approaches, such as mask wearing, increased teleworking, and restrictions on large gatherings. Our study also demonstrates the value of conducting high-quality contact-tracing investigations to advance our understanding of the transmission dynamics of an emerging pathogen. ![Figure][2] Transmission chains, contact patterns, and transmission kinetics of SARS-CoV-2 in Hunan, China, based on case and contact-tracing data from Hunan, China. (Top left) One realization of the reconstructed transmission chains, with a histogram representing overdispersion in the distribution of secondary infections. (Top right) Contact matrices of community, social, extended family, and household contacts reveal distinct age profiles. (Bottom) Earlier isolation of primary infections shortens the generation and serial intervals while increasing the relative contribution of transmission in the presymptomatic phase. A long-standing question in infectious disease dynamics concerns the role of transmission heterogeneities, which are driven by demography, behavior, and interventions. On the basis of detailed patient and contact-tracing data in Hunan, China, we find that 80% of secondary infections traced back to 15% of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) primary infections, which indicates substantial transmission heterogeneities. Transmission risk scales positively with the duration of exposure and the closeness of social interactions and is modulated by demographic and clinical factors. The lockdown period increases transmission risk in the family and households, whereas isolation and quarantine reduce risks across all types of contacts. The reconstructed infectiousness profile of a typical SARS-CoV-2 patient peaks just before symptom presentation. Modeling indicates that SARS-CoV-2 control requires the synergistic efforts of case isolation, contact quarantine, and population-level interventions because of the specific transmission kinetics of this virus. [1]: /lookup/doi/10.1126/science.abe2424 [2]: pending:yes

Learning the language of viral evolution and escape


Viral mutations that evade neutralizing antibodies, an occurrence known as viral escape, can occur and may impede the development of vaccines. To predict which mutations may lead to viral escape, Hie et al. used a machine learning technique for natural language processing with two components: grammar (or syntax) and meaning (or semantics) (see the Perspective by Kim and Przytycka). Three different unsupervised language models were constructed for influenza A hemagglutinin, HIV-1 envelope glycoprotein, and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein. Semantic landscapes for these viruses predicted viral escape mutations that produce sequences that are syntactically and/or grammatically correct but effectively different in semantics and thus able to evade the immune system. Science , this issue p. [284][1]; see also p. [233][2] The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence’s grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution. [1]: /lookup/doi/10.1126/science.abd7331 [2]: /lookup/doi/10.1126/science.abf6894

Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic


Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) incidence peaked in Manaus, Brazil, in May 2020 with a devastating toll on the city's inhabitants, leaving its health services shattered and cemeteries overwhelmed. Buss et al. collected data from blood donors from Manaus and São Paulo, noted when transmission began to fall, and estimated the final attack rates in October 2020 (see the Perspective by Sridhar and Gurdasani). Heterogeneities in immune protection, population structure, poverty, modes of public transport, and uneven adoption of nonpharmaceutical interventions mean that despite a high attack rate, herd immunity may not have been achieved. This unfortunate city has become a sentinel for how natural population immunity could influence future transmission. Events in Manaus reveal what tragedy and harm to society can unfold if this virus is left to run its course. Science , this issue p. [288][1]; see also p. [230][2] Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread rapidly in Manaus, the capital of Amazonas state in northern Brazil. The attack rate there is an estimate of the final size of the largely unmitigated epidemic that occurred in Manaus. We use a convenience sample of blood donors to show that by June 2020, 1 month after the epidemic peak in Manaus, 44% of the population had detectable immunoglobulin G (IgG) antibodies. Correcting for cases without a detectable antibody response and for antibody waning, we estimate a 66% attack rate in June, rising to 76% in October. This is higher than in São Paulo, in southeastern Brazil, where the estimated attack rate in October was 29%. These results confirm that when poorly controlled, COVID-19 can infect a large proportion of the population, causing high mortality. [1]: /lookup/doi/10.1126/science.abe9728 [2]: /lookup/doi/10.1126/science.abf7921

These five AI developments will shape 2021 and beyond

MIT Technology Review

The year 2020 was profoundly challenging for citizens, companies, and governments around the world. As covid-19 spread, requiring far-reaching health and safety restrictions, artificial intelligence (AI) applications played a crucial role in saving lives and fostering economic resilience. Research and development (R&D) to enhance core AI capabilities, from autonomous driving and natural language processing to quantum computing, continued unabated. Baidu was at the forefront of many important AI breakthroughs in 2020. This article outlines five significant advances with implications for combating covid-19 as well as transforming the future of our economies and society.

Forehead scanners result in a large number of false, study warns

Daily Mail - Science & tech

Thermal screening to spot people infected with coronavirus is more reliable when scanning the eyeball and fingertip than taking body or forehead measurements. Experts in human physiology published a scientific article on the usefulness of thermometers which scan a person's skin to detect a fever. They say the current process is fundamentally flawed and produces a large number of false negatives, as well as some false positives, and also because not all people infected with the coronavirus develop a fever. A fever is defined as a temperature of greater than or equal to 100.4F (38 C) if spotted outside of a healthcare environment. In healthcare settings, such as a hospital, a fever is technically defined as anything greater than or equal to 100.0F (37.8 C).