Goto

Collaborating Authors

Science


Developmental and evolutionary dynamics of cis-regulatory elements in mouse cerebellar cells

Science

Gene-regulatory networks govern the development of organs. Sarropoulos et al. analyzed mouse cerebellar development in the context of gene-regulatory networks. Single nuclear profiles analyzing chromatin accessibility in about 90,000 cells revealed diversity in progenitor cells and genetic programs guiding cellular differentiation. The footsteps of evolution were apparent in varying constraints on different cell types. Science , abg4696, this issue p. [eabg4696][1] ### INTRODUCTION The cerebellum contributes to many complex brain functions, including motor control, language, and memory. During development, distinct neural cells are generated at cerebellar germinal zones in a spatiotemporally restricted manner. Cis-regulatory elements (CREs), such as enhancers and promoters, and the transcription factors that bind to them are central to cell fate specification and differentiation. Although most CREs undergo rapid turnover during evolution, a few are conserved across vertebrates. ### RATIONALE Bulk measurements of CRE activity have provided insights into gene regulation in the cerebellum, as well as into the evolutionary dynamics of CREs during organ development. However, they lack the cellular resolution required to assess cell-type differences in regulatory constraint and unravel the regulatory programs associated with the specification and differentiation of cell types. ### RESULTS Here, we generated a single-cell atlas of gene regulation in the mouse cerebellum spanning 11 developmental stages, from the beginning of neurogenesis to adulthood. By acquiring snATAC-seq (single-nucleus assay for transposase accessible chromatin using sequencing) profiles for ~90,000 cells, we mapped all major cerebellar cell types and identified candidate CREs. Characterization of CRE activity across the cerebellum development highlights the cell- and time-specificity of gene regulation. Many of the differentially accessible CREs are specific to a single cell type and state, but we also identified a fraction of CREs with pleiotropic (shared) activity. At early developmental stages, temporal changes in CRE activity are shared between progenitor cells from different germinal zones, supporting a model of cell fate induction through common temporal cues. Pleiotropic CREs in major cerebellar neuron types (granule cells, Purkinje cells, and inhibitory interneurons) are more active at early differentiation states, and the regulatory programs gradually diverge as differentiation proceeds. Based on comparisons to vertebrate genomes, we observed a decrease in CRE sequence conservation during development for all cerebellar cell types, a pattern that is largely explained by differentiation as well as by additional temporal differences between cells from matched differentiation states. Across cell types, differences in regulatory conservation are most pronounced in the adult, where microglia—the immune cells of the brain—show the fastest evolutionary turnover. By contrast, mature astrocytes harbor the most conserved intergenic CREs, not only in the cerebellum but also across a wide range of cell types in adult mouse organs. To evaluate the conservation of CRE activity, we acquired snATAC-seq profiles for ~20,000 cerebellar cells from the gray short-tailed opossum, a marsupial separated from mouse by ~160 million years of evolution. Our comparative analysis of CRE activity in the two therian species reinforced our sequence-based conclusions regarding differences in CRE constraint across cell types and developmental stages and also revealed that despite the overall high turnover of CREs, radical repurposing of spatiotemporal CRE activity is rare, at least between cell types in the same tissue. ### CONCLUSION This study reveals extensive temporal differences in CRE activity across cerebellar cell types and a shared decrease in CRE conservation during development and differentiation. Given that the cerebellum has been successfully used as a model system to study cell fate specification, neurogenesis, and other developmental processes, we expect that our observations regarding the developmental and evolutionary dynamics of regulatory elements, and their interplay, are also applicable to mammalian organs in general. ![Figure][2] Cis-regulatory elements in cerebellar cells. snATAC-seq delineates cell- and time-specific CRE activity in the developing mouse cerebellum (left). The chromatin accessibility profiles of cerebellar neuron types gradually diverge during differentiation as the activity of pleiotropic (shared) CREs decreases (top right). The evolutionary conservation of CRE sequences in vertebrates and activity in therian mammals decreases across development and differs between cell types (bottom right). mRNA, messenger RNA; PCA, principal components analysis; TF, transcription factor. Organ development is orchestrated by cell- and time-specific gene regulatory networks. In this study, we investigated the regulatory basis of mouse cerebellum development from early neurogenesis to adulthood. By acquiring snATAC-seq (single-nucleus assay for transposase accessible chromatin using sequencing) profiles for ~90,000 cells spanning 11 stages, we mapped cerebellar cell types and identified candidate cis - regulatory elements (CREs). We detected extensive spatiotemporal heterogeneity among progenitor cells and a gradual divergence in the regulatory programs of cerebellar neurons during differentiation. Comparisons to vertebrate genomes and snATAC-seq profiles for ∼20,000 cerebellar cells from the marsupial opossum revealed a shared decrease in CRE conservation during development and differentiation as well as differences in constraint between cell types. Our work delineates the developmental and evolutionary dynamics of gene regulation in cerebellar cells and provides insights into mammalian organ development. [1]: /lookup/doi/10.1126/science.abg4696 [2]: pending:yes


Population sequencing data reveal a compendium of mutational processes in the human germ line

Science

It has become increasing clear that mutation affects phenotypic variation and disease risk across humans. However, there are many different types of mutation. Seplyarskiy et al. applied a matrix factorization method to large human genomic datasets to identify germline mutational processes in an unsupervised manner. From this survey, nine robust mutational components were identified and specific mechanisms generating seven of these processes were proposed from correlations with genomic features. These results confirm and improve upon our understanding of mutational processes and reveal likely mechanisms of mutation in the human genome. Science , aba7408, this issue p. [1030][1] Biological mechanisms underlying human germline mutations remain largely unknown. We statistically decompose variation in the rate and spectra of mutations along the genome using volume-regularized nonnegative matrix factorization. The analysis of a sequencing dataset (TOPMed) reveals nine processes that explain the variation in mutation properties between loci. We provide a biological interpretation for seven of these processes. We associate one process with bulky DNA lesions that are resolved asymmetrically with respect to transcription and replication. Two processes track direction of replication fork and replication timing, respectively. We identify a mutagenic effect of active demethylation primarily acting in regulatory regions and a mutagenic effect of long interspersed nuclear elements. We localize a mutagenic process specific to oocytes from population sequencing data. This process appears transcriptionally asymmetric. [1]: /lookup/doi/10.1126/science.aba7408


Piercing the fog of the RNA structure-ome

Science

RNA is distinct among large biomolecules in that it has both informational coding ability, carried in its sequence, and the ability to form complex three-dimensional structures that can have catalytic and regulatory roles. The information-carrying component is widely appreciated. The pattern of base pairing—the first level of RNA structure—can be experimentally assessed and modeled with impressive accuracy ([ 1 ][1], [ 2 ][2]). By contrast, our understanding of the extent and roles of complex three-dimensional RNA structures remains rudimentary. RNA viral genomes are rich in motifs with complex three-dimensional structures with regulatory functions ([ 3 ][3]), and evidence increasingly supports the hypothesis that functional RNA structures are ubiquitous in organisms ranging from bacteria to humans. However, developing and testing hypotheses about the roles of RNA structure have been hindered by the inability to identify and model these structures. On page 1047 of this issue, Townshend et al. ([ 4 ][4]) report a machine-learning strategy for identifying native-like RNA folds. Nearly all RNAs that form well-understood complex structures fall into a small number of classes: the ribosomal RNAs, the large and small ribozymes that catalyze RNA cleavage, bacterial riboswitches, and regulatory elements encoded by RNA viruses. Thus, there are limited examples for guiding identification and modeling of RNAs with complex three-dimensional structures. There are only four major RNA nucleotides, and the interactions that govern base pairing and simple helix formation are well understood. Once formed, RNA helices (secondary structure) often assemble as fairly rigid elements that interact hierarchically to form more complicated structures (tertiary structure) (see the figure). Despite these simplifying features, the modeling of complex RNA structures has proven to be difficult. The RNA-Puzzles community exercise ([ 5 ][5], [ 6 ][6]) has been instrumental in illuminating the challenges involved: Groups try to predict an RNA structure from its sequence before learning the solved structure. Several rounds of RNA-Puzzles have revealed important themes. No single method consistently yields the best models, although certain approaches have better records than others, and most approaches are getting better. The best agreement tends to result when experimental or homology-based information is incorporated into the computational modeling. However, the median accuracy for small RNAs, with complex tertiary folds but without a close known homolog, has stayed stubbornly stuck in a range of ∼15- to 20-Å root mean square deviation [(RMSD) a measure of the similarity between known and modeled structures]. This agreement is much poorer than that now achieved for protein structures by machine learning ([ 7 ][7]), where native-like folds (∼2-Å RMSD or less) are achieved. Modeled RNA structures thus often recapitulate the overall fold of a target RNA but do not consistently reveal details of the tertiary structure. Current methods are not likely to be useful for applications such as understanding the biological mechanism of a structure or for designing ligands (or drugs) that modulate RNA function. ![Figure][8] RNA structure RNA molecules have multiple levels of structure and ability to encode information. The sequence of RNA is readily determined. RNA secondary structure can now be elucidated with high levels of accuracy using approaches that meld computational energy minimization with experimental per-nucleotide chemical probing information. Townshend et al. developed a deep neural network that can identify models that best represent the native tertiary state, taking a step toward modeling three-dimensional RNA structure. GRAPHIC: C. BICKEL/ SCIENCE The Atomic Rotationally Equivalent Scorer (ARES) approach of Townshend et al. is a deep neural network, a form of machine learning, and did not initially include preconceived notions of RNA structure. Indeed, the ARES framework is not specific to RNA and can be applied to other problems in molecular structure. Instead, ARES was given a small set of motifs with known RNA structure plus a large number of alternative (incorrect) variations of these same structures. ARES parameters were adjusted so that the program learned the functional and geometric arrangements of each atom and how these elements are positioned relative to each other. Layers in the neural network compute features from finer to coarser scales to recognize base pairs, helices, and more-complex structures. For example, ARES learned patterns of base pairing, the optimal geometry for RNA helices, and a subset of noncanonical tertiary motifs without being provided explicit information about these features of RNA structure. Although ARES was trained on very simple RNA systems, the resulting ARES scoring function was able to predict structures of more complex RNAs, on average, to roughly a 12-Å RMSD. This degree of accuracy represents an overall improvement of ∼4 Å over prior scoring methods. ARES is still short of the level consistent with atomic resolution or sufficient to guide identification of key functional sites or drug discovery efforts, but Townshend et al. have achieved notable progress in a field that has proven recalcitrant to transformative advances. There are three fundamental challenges for modeling complex RNA three-dimensional structures: generating reasonable structures that may represent a biological state, accurately scoring or identifying models that best represent the correct native state, and using these hopefully accurate models to discover new functional motifs and to develop hypotheses regarding the mechanisms by which RNAs with complex three-dimensional structures regulate biological processes. The ARES machine-learning approach addressed the second of these three challenges: Candidate structures still need to be generated for evaluation by ARES. With further development, deep learning strategies hold promise for creating new scoring functions that can guide structure generation in ways that might yield near-native structures. Another important goal is to use a machine-learning strategy to identify regions in large RNAs most likely to fold into three-dimensional structures. Current computational-only algorithms are not able to predict the pattern of base pairing in large RNAs accurately, even though base pairs are simpler to predict than tertiary structure. However, secondary structures for large RNAs are routinely modeled to high accuracies by incorporating experimental information. New, efficiently executed experiments are now being developed that measure features of RNA tertiary structures. Another frontier, analogous to recent advances in secondary structure modeling, would thus be to incorporate experimental information into machine-learning strategies for modeling RNA tertiary structure. Large-scale investigation of RNA structure to date, primarily focused on RNA secondary structure, has revealed several core principles. One is that the existence of regions within large RNAs with complex, higher-order structure is unremarkable. When these base pairing and tertiary structures affect biological functions, they create “an RNA structure code” with pervasive effects on gene regulatory circuits. Additionally, every RNA likely has a distinct structural personality, which implies that there are numerous ways by which RNA structure tunes the underlying function of an RNA. At the level of secondary structure, such tuning RNA structures tend to function like switches and attenuators that modulate binding by RNA and protein ligands ([ 8 ][9]–[ 11 ][10]). Finally, characterization of well-determined RNA secondary structures often leads to identification of centers of new biology. As it becomes possible to measure, (deeply) learn, and predict the details of the tertiary RNA structure-ome, diverse new discoveries in biological mechanisms await. 1. [↵][11]1. E. J. Strobel et al ., Nat. Rev. Genet. 19, 615 (2018). [OpenUrl][12][CrossRef][13][PubMed][14] 2. [↵][15]1. K. M. Weeks , Acc. Chem. Res. 54, 2502 (2021). [OpenUrl][16][CrossRef][17] 3. [↵][18]1. Z. A. Jaafar, 2. J. S. Kieft , Nat. Rev. Microbiol. 17, 110 (2019). [OpenUrl][19][CrossRef][20] 4. [↵][21]1. R. J. L. Townshend et al ., Science 373, 1047 (2021). [OpenUrl][22][Abstract/FREE Full Text][23] 5. [↵][24]1. J. A. Cruz et al ., RNA 18, 610 (2012). [OpenUrl][25][Abstract/FREE Full Text][26] 6. [↵][27]1. Z. Miao et al ., RNA 26, 982 (2020). [OpenUrl][28][Abstract/FREE Full Text][29] 7. [↵][30]1. E. Pennisi , Science 373, 262 (2021). [OpenUrl][31][Abstract/FREE Full Text][32] 8. [↵][33]1. D. Long et al ., Nat. Struct. Mol. Biol. 14, 287 (2007). [OpenUrl][34][CrossRef][35][PubMed][36][Web of Science][37] 9. 1. M. Kertesz et al ., Nat. Genet. 39, 1278 (2007). [OpenUrl][38][CrossRef][39][PubMed][40][Web of Science][41] 10. 1. D. Dominguez et al ., Mol. Cell 70, 854 (2018). [OpenUrl][42][CrossRef][43][PubMed][44] 11. [↵][45]1. A. M. Mustoe et al ., Biochemistry 57, 3537 (2018). [OpenUrl][46][CrossRef][47] Acknowledgments: The author’s laboratory is supported by the US National Institutes of Health and National Science Foundation. The author is an advisor to and holds equity in Ribometrix. [1]: #ref-1 [2]: #ref-2 [3]: #ref-3 [4]: #ref-4 [5]: #ref-5 [6]: #ref-6 [7]: #ref-7 [8]: pending:yes [9]: #ref-8 [10]: #ref-11 [11]: #xref-ref-1-1 "View reference 1 in text" [12]: {openurl}?query=rft.jtitle%253DNat.%2BRev.%2BGenet.%26rft.volume%253D19%26rft.spage%253D615%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fs41576-018-0034-x%26rft_id%253Dinfo%253Apmid%252F30054568%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [13]: /lookup/external-ref?access_num=10.1038/s41576-018-0034-x&link_type=DOI [14]: /lookup/external-ref?access_num=30054568&link_type=MED&atom=%2Fsci%2F373%2F6558%2F964.atom [15]: #xref-ref-2-1 "View reference 2 in text" [16]: {openurl}?query=rft.jtitle%253DAcc.%2BChem.%2BRes.%26rft.volume%253D54%26rft.spage%253D2502%26rft_id%253Dinfo%253Adoi%252F10.1021%252Facs.accounts.1c00118%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [17]: /lookup/external-ref?access_num=10.1021/acs.accounts.1c00118&link_type=DOI [18]: #xref-ref-3-1 "View reference 3 in text" [19]: {openurl}?query=rft.jtitle%253DNat.%2BRev.%2BMicrobiol.%26rft.volume%253D17%26rft.spage%253D110%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fs41579-018-0117-x%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [20]: /lookup/external-ref?access_num=10.1038/s41579-018-0117-x&link_type=DOI [21]: #xref-ref-4-1 "View reference 4 in text" [22]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DTownshend%26rft.auinit1%253DR.%2BJ.%2BL.%26rft.volume%253D373%26rft.issue%253D6558%26rft.spage%253D1047%26rft.epage%253D1051%26rft.atitle%253DGeometric%2Bdeep%2Blearning%2Bof%2BRNA%2Bstructure%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.abe5650%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [23]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNzMvNjU1OC8xMDQ3IjtzOjQ6ImF0b20iO3M6MjI6Ii9zY2kvMzczLzY1NTgvOTY0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [24]: #xref-ref-5-1 "View reference 5 in text" [25]: {openurl}?query=rft.jtitle%253DRNA%26rft_id%253Dinfo%253Adoi%252F10.1261%252Frna.031054.111%26rft_id%253Dinfo%253Apmid%252F22361291%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [26]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoicm5hIjtzOjU6InJlc2lkIjtzOjg6IjE4LzQvNjEwIjtzOjQ6ImF0b20iO3M6MjI6Ii9zY2kvMzczLzY1NTgvOTY0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [27]: #xref-ref-6-1 "View reference 6 in text" [28]: {openurl}?query=rft.jtitle%253DRNA%26rft_id%253Dinfo%253Adoi%252F10.1261%252Frna.075341.120%26rft_id%253Dinfo%253Apmid%252F32371455%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [29]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoicm5hIjtzOjU6InJlc2lkIjtzOjg6IjI2LzgvOTgyIjtzOjQ6ImF0b20iO3M6MjI6Ii9zY2kvMzczLzY1NTgvOTY0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [30]: #xref-ref-7-1 "View reference 7 in text" [31]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DPennisi%26rft.auinit1%253DE.%26rft.volume%253D373%26rft.issue%253D6552%26rft.spage%253D262%26rft.epage%253D263%26rft.atitle%253DProtein%2Bstructure%2Bprediction%2Bnow%2Beasier%252C%2Bfaster%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.373.6552.262%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [32]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzMvNjU1Mi8yNjIiO3M6NDoiYXRvbSI7czoyMjoiL3NjaS8zNzMvNjU1OC85NjQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [33]: #xref-ref-8-1 "View reference 8 in text" [34]: {openurl}?query=rft.jtitle%253DNature%2Bstructural%2B%2526%2Bmolecular%2Bbiology%26rft.stitle%253DNat%2BStruct%2BMol%2BBiol%26rft.aulast%253DNollmann%26rft.auinit1%253DM.%26rft.volume%253D14%26rft.issue%253D4%26rft.spage%253D287%26rft.epage%253D294%26rft.atitle%253DPotent%2Beffect%2Bof%2Btarget%2Bstructure%2Bon%2BmicroRNA%2Bfunction.%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fnsmb1226%26rft_id%253Dinfo%253Apmid%252F17401373%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [35]: /lookup/external-ref?access_num=10.1038/nsmb1226&link_type=DOI [36]: /lookup/external-ref?access_num=17401373&link_type=MED&atom=%2Fsci%2F373%2F6558%2F964.atom [37]: /lookup/external-ref?access_num=000245469600014&link_type=ISI [38]: {openurl}?query=rft.jtitle%253DNature%2Bgenetics%26rft.stitle%253DNat%2BGenet%26rft.aulast%253DKertesz%26rft.auinit1%253DM.%26rft.volume%253D39%26rft.issue%253D10%26rft.spage%253D1278%26rft.epage%253D1284%26rft.atitle%253DThe%2Brole%2Bof%2Bsite%2Baccessibility%2Bin%2BmicroRNA%2Btarget%2Brecognition.%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fng2135%26rft_id%253Dinfo%253Apmid%252F17893677%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [39]: /lookup/external-ref?access_num=10.1038/ng2135&link_type=DOI [40]: /lookup/external-ref?access_num=17893677&link_type=MED&atom=%2Fsci%2F373%2F6558%2F964.atom [41]: /lookup/external-ref?access_num=000249737400026&link_type=ISI [42]: {openurl}?query=rft.jtitle%253DMol.%2BCell%26rft.volume%253D70%26rft.spage%253D854%26rft_id%253Dinfo%253Adoi%252F10.1016%252Fj.molcel.2018.05.001%26rft_id%253Dinfo%253Apmid%252F29883606%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [43]: /lookup/external-ref?access_num=10.1016/j.molcel.2018.05.001&link_type=DOI [44]: /lookup/external-ref?access_num=29883606&link_type=MED&atom=%2Fsci%2F373%2F6558%2F964.atom [45]: #xref-ref-11-1 "View reference 11 in text" [46]: {openurl}?query=rft.jtitle%253DBiochemistry%26rft.volume%253D57%26rft.spage%253D3537%26rft_id%253Dinfo%253Adoi%252F10.1021%252Facs.biochem.8b00395%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [47]: /lookup/external-ref?access_num=10.1021/acs.biochem.8b00395&link_type=DOI


Contrapuntal gene risk

Science

Genomic Analysis Identifying functional genetic variation in humans requires sifting through hundreds of thousands of individual variants and linking them to the trait of interest. We often do not know whether a gene is functional in a tissue or specific cell. Machine-learning models have become valuable for such endeavors. Somepalli et al. developed a model they call FUGUE, which they used to map the tissue-specific expression of human disease–associated genes and their protein context and interactions. Interestingly, FUGUE revealed that tissue-relevant genes cluster on the genome within topologically associated domains. The authors supply prioritized gene lists for 30 human tissues for genes associated with heart disease, Alzheimer's disease, cancer, and development. PLoS Comput. Biol. 17 , e1009194 (2021).


Accurate prediction of protein structures and interactions using a three-track neural network

Science

In 1972, Anfinsen won a Nobel prize for demonstrating a connection between a protein's amino acid sequence and its three-dimensional structure. Since 1994, scientists have competed in the biannual Critical Assessment of Structure Prediction (CASP) protein-folding challenge. Deep learning methods took center stage at CASP14, with DeepMind's Alphafold2 achieving remarkable accuracy. Baek et al. explored network architectures based on the DeepMind framework. They used a three-track network to process sequence, distance, and coordinate information simultaneously and achieved accuracies approaching those of DeepMind. The method, RoseTTA fold, can solve challenging x-ray crystallography and cryo–electron microscopy modeling problems and generate accurate models of protein-protein complexes. Science , abj8754, this issue p. [871][1] DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo–electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research. [1]: /lookup/doi/10.1126/science.abj8754


Evolving threat

Science

New variants have changed the face of the pandemic. What will the virus do next? ![Figure][1] CREDITS: (GRAPHIC) N. DESAI/ SCIENCE ; (DATA) NEXTSTRAIN; GISAID Edward Holmes does not like making predictions, but last year he hazarded a few. Again and again, people had asked Holmes, an expert on viral evolution at the University of Sydney, how he expected SARS-CoV-2 to change. In May 2020, 5 months into the pandemic, he started to include a slide with his best guesses in his talks. The virus would probably evolve to avoid at least some human immunity, he suggested. But it would likely make people less sick over time, he said, and there would be little change in its infectivity. In short, it sounded like evolution would not play a major role in the pandemic's near future. “A year on I've been proven pretty much wrong on all of it,” Holmes says. Well, not all: SARS-CoV-2 did evolve to better avoid human antibodies. But it has also become a bit more virulent and a lot more infectious, causing more people to fall ill. That has had an enormous influence on the course of the pandemic. The Delta strain circulating now—one of four “variants of concern” identified by the World Health Organization, along with four “variants of interest”—is so radically different from the virus that appeared in Wuhan, China, in late 2019 that many countries have been forced to change their pandemic planning. Governments are scrambling to accelerate vaccination programs while prolonging or even reintroducing mask wearing and other public health measures. As to the goal of reaching herd immunity—vaccinating so many people that the virus simply has nowhere to go—“With the emergence of Delta, I realized that it's just impossible to reach that,” says Müge Çevik, an infectious disease specialist at the University of St. Andrews. Yet the most tumultuous period in SARS-CoV-2's evolution may still be ahead of us, says Aris Katzourakis, an evolutionary biologist at the University of Oxford. There's now enough immunity in the human population to ratchet up an evolutionary competition, pressuring the virus to adapt further. At the same time, much of the world is still overwhelmed with infections, giving the virus plenty of chances to replicate and throw up new mutations. Predicting where those worrisome factors will lead is just as tricky as it was a year and a half ago, however. “We're much better at explaining the past than predicting the future,” says Andrew Read, an evolutionary biologist at Pennsylvania State University, University Park. Evolution, after all, is driven by random mutations, which are impossible to predict. “It's very, very tricky to know what's possible, until it happens,” Read says. “It's not physics. It doesn't happen on a billiard table.” Still, experience with other viruses gives evolutionary biologists some clues about where SARS-CoV-2 may be headed. The courses of past outbreaks show the coronavirus could well become even more infectious than Delta is now, Read says: “I think there's every expectation that this virus will continue to adapt to humans and will get better and better at us.” Far from making people less sick, it could also evolve to become even deadlier, as some previous viruses including the 1918 flu have. And although COVID-19 vaccines have held up well so far, history shows the virus could evolve further to elude their protective effect—although a recent study in another coronavirus suggests that could take many years, which would leave more time to adapt vaccines to the changing threat. Holmes himself uploaded one of the first SARS-CoV-2 genomes to the internet on 10 January 2020. Since then, more than 2 million genomes have been sequenced and published, painting an exquisitely detailed picture of a changing virus. “I don't think we've ever seen that level of precision in watching an evolutionary process,” Holmes says. Making sense of the endless stream of mutations is complicated. Each is just a tiny tweak in the instructions for how to make proteins. Which mutations end up spreading depends on how the viruses carrying those tweaked proteins fare in the real world. The vast majority of mutations give the virus no advantage at all, and identifying the ones that do is difficult. There are obvious candidates, such as mutations that change the part of the spike protein—which sits on the surface of the virus—that binds to human cells. But changes elsewhere in the genome may be just as crucial—yet are harder to interpret. Some genes' functions aren't even clear, let alone what a change in their sequence could mean. The impact of any one change on the virus' fitness also depends on other changes it has already accumulated. That means scientists need real-world data to see which variants appear to be taking off. Only then can they investigate, in cell cultures and animal experiments, what might explain that viral success. The most eye-popping change in SARS-CoV-2 so far has been its improved ability to spread between humans. At some point early in the pandemic, SARS-CoV-2 acquired a mutation called D614G that made it a bit more infectious. That version spread around the world; almost all current viruses are descended from it. Then in late 2020, scientists identified a new variant, now called Alpha, in patients in Kent, U.K., that was about 50% more transmissible. Delta, first seen in India and now conquering the world, is another 40% to 60% more transmissible than Alpha. Read says the pattern is no surprise. “The only way you could not get infectiousness rising would be if the virus popped into humans as perfect at infecting humans as it could be, and the chance of that happening is incredibly small,” he says. But Holmes was startled. “This virus has gone up three notches in effectively a year and that, I think, was the biggest surprise to me,” Holmes says. “I didn't quite appreciate how much further the virus could get.” Bette Korber at Los Alamos National Laboratory and her colleagues first suggested that D614G, the early mutation, was taking over because it made the virus better at spreading. She says skepticism about the virus' ability to evolve was common in the early days of the pandemic, with some researchers saying D614G's apparent advantage might be sheer luck. “There was extraordinary resistance in the scientific community to the idea this virus could evolve as the pandemic grew in seriousness in spring of 2020,” Korber says. ![Figure][1] CREDITS: (GRAPHIC) N. DESAI/ SCIENCE ; (DATA) NEXTSTRAIN; GISAID Researchers had never watched a completely novel virus spread so widely and evolve in humans, after all. “We're used to dealing with pathogens that have been in humanity for centuries, and their evolutionary course is set in the context of having been a human pathogen for many, many years,” says Jeremy Farrar, head of the Wellcome Trust. Katzourakis agrees. “This may have affected our priors and conditioned many to think in a particular way,” he says. Another, more practical problem is that real-world advantages for the virus don't always show up in cell culture or animal models. “There is no way anyone would have noticed anything special about Alpha from laboratory data alone,” says Christian Drosten, a virologist at the Charité University Hospital in Berlin. He and others are still figuring out what, at the molecular level, gives Alpha and Delta an edge. Alpha seems to bind more strongly to the human ACE2 receptor, the virus' target on the cell surface, partly because of a mutation in the spike protein called N501Y. It may also be better at countering interferons, molecules that are part of the body's viral immune defenses. Together those changes may lower the amount of virus needed to infect someone—the infectious dose. In Delta, one of the most important changes may be near the furin cleavage site on spike, where a human enzyme cuts the protein, a key step enabling the virus to invade human cells. A mutation called P681R in that region makes cleavage more efficient, which may allow the virus to enter more cells faster and lead to greater numbers of virus particles in an infected person. In July, Chinese researchers posted a preprint showing Delta could lead to virus levels in patient samples 1000 times higher than for previous variants. Evidence is accumulating that infected people not only spread the virus more efficiently, but also faster, allowing the variant to spread even more rapidly. The new variants of SARS-CoV-2 may also cause more severe disease. For example, a study in Scotland found that an infection with Delta was about twice as likely to lead to hospital admission than with Alpha. It wouldn't be the first time a newly emerging disease quickly became more serious. The 1918–19 influenza pandemic also appears to have caused more serious illness as time went on, says Lone Simonsen, an epidemiologist at Roskilde University who studies past pandemics. “Our data from Denmark suggests it was six times deadlier in the second wave.” A popular notion holds that viruses tend to evolve over time to become less dangerous, allowing the host to live longer and spread the virus more widely. But that idea is too simplistic, Holmes says. “The evolution of virulence has proven to be quicksand for evolutionary biologists,” he says. “It's not a simple thing.” Two of the best studied examples of viral evolution are myxoma virus and rabbit hemorrhagic disease virus, which were released in Australia in 1960 and 1996, respectively, to decimate populations of European rabbits that were destroying croplands and wreaking ecological havoc. Myxoma virus initially killed more than 99% of infected rabbits, but then less pathogenic strains evolved, likely because the virus was killing many animals before they had a chance to pass it on. (Rabbits also evolved to be less susceptible.) Rabbit hemorrhagic disease virus, by contrast, got more deadly over time, probably because the virus is spread by blow flies feeding on rabbit carcasses, and quicker death accelerated its spread. Other factors loosen the constraints on deadliness. For example, a virus variant that can outgrow other variants within a host can end up dominating even if it makes the host sicker and reduces the likelihood of transmission. And an assumption about human respiratory diseases may not always hold: that a milder virus—one that doesn't make you crawl into bed, say—might allow an infected person to spread the virus further. In SARS-CoV-2, most transmission happens early on, when the virus is replicating in the upper airways, whereas serious disease, if it develops, comes later, when the virus infects the lower airways. As a result, a variant that makes the host sicker might spread just as fast as before. From the start of the pandemic, researchers have worried about a third type of viral change, perhaps the most unsettling of all: that SARS-CoV-2 might evolve to evade immunity triggered by natural infections or vaccines. Already, several variants have emerged sporting changes in the surface of the spike protein that make it less easily recognized by antibodies. But although news of these variants has caused widespread fear, their impact has so far been limited. Derek Smith, an evolutionary biologist at the University of Cambridge, has worked for decades on visualizing immune evasion in the influenza virus in so-called antigenic maps. The farther apart two variants are on Smith's maps, the less well antibodies against one virus protect against the other. In a recently published preprint, Smith's group, together with David Montefiori's group at Duke University, has applied the approach to mapping the most important variants of SARS-CoV-2 (see graphic, below). The new maps place the Alpha variant very close to the original Wuhan virus, which means antibodies against one still neutralize the other. The Delta variant, however, has drifted farther away, even though it doesn't completely evade immunity. “It's not an immune escape in the way people think of an escape in slightly cartoonish terms,” Katzourakis says. But Delta is slightly more likely to infect fully vaccinated people than previous variants. “It shows the possible beginning of a trajectory and that's what worries me,” Katzourakis says. ![Figure][1] CREDITS: (GRAPHIC) N. DESAI/ SCIENCE ; (DATA) DEREK SMITH/UNIVERSITY OF CAMBRIDGE; DAVID MONTEFIORI/DUKE UNIVERSITY Other variants have evolved more antigenic distance from the original virus than Delta. Beta, which first appeared in South Africa, has traveled the farthest on the map, although natural or vaccine-induced immunity still largely protects against it. And Beta's attempts to get away may come at a price, as Delta has outstripped it worldwide. “It's probably the case that when a virus changes to escape immunity, it loses other aspects of its fitness,” Smith says. The map shows that for now, the virus is not moving in any particular direction. If the original Wuhan virus is like a town on Smith's map, the virus has been taking local trains to explore the surrounding area, but it has not traveled to the next city—not yet. Although it's impossible to predict exactly how infectiousness, virulence, and immune evasion will develop in the coming months, some of the factors that will influence the virus' trajectory are clear. One is the immunity that is now rapidly building in the human population. On one hand, immunity reduces the likelihood of people getting infected, and may hamper viral replication even when they are. “That means there will be fewer mutations emerging if we vaccinate more people,” Çevik says. On the other hand, any immune escape variant now has a huge advantage over other variants. In fact, the world is probably at a tipping point, Holmes says: With more than 2 billion people having received at least one vaccine dose and hundreds of millions more having recovered from COVID-19, variants that evade immunity may now have a bigger leg up than those that are more infectious. Something similar appears to have happened when a new H1N1 influenza strain emerged in 2009 and caused a pandemic, says Katia Kölle, an evolutionary biologist at Emory University. A 2015 paper found that changes in the virus in the first 2 years appeared to make the virus more adept at human-to-human transmission, whereas changes after 2011 were mostly to avoid human immunity. It may already be getting harder for SARS-CoV-2 to make big gains in infectiousness. “There are some fundamental limits to exactly how good a virus can get at transmitting and at some point SARS-CoV-2 will hit that plateau,” says Jesse Bloom, an evolutionary biologist at the Fred Hutchinson Cancer Research Center. “I think it's very hard to say if this is already where we are, or is it still going to happen.” Evolutionary virologist Kristian Andersen of Scripps Research guesses the virus still has space to evolve greater transmissibility. “The known limit in the viral universe is measles, which is about three times more transmissible than what we have now with Delta,” he says. ![Figure][1] CREDITS: (GRAPHIC) N. DESAI/ SCIENCE ; (DATA) E. WALL ET AL., THE LANCET , 397:10292, 2331 (2021) The limits of immune escape are equally uncertain. Smith's antigenic maps show the space the virus has explored so far. But can it go much farther? If the variants on the map are like towns, then where are the country's natural boundaries—where does the ocean start? A crucial clue will be where the next few variants appear on the map, Smith says. Beta evolved in one direction away from the original virus and Delta in another. “It's too soon to say this now, but we might be heading for a world where there are two serotypes of this virus that would also both have to be considered in any vaccines,” Drosten says. Immune escape is so worrying because it could force humanity to update its vaccines continually, as happens for flu. Yet the vaccines against many other diseases—measles, polio, and yellow fever, for example—have remained effective for decades without updates, even in the rare cases where immune-evading variants appeared. “There was big alarm around 2000 that maybe we'd need to replace the hepatitis B vaccines,” because an escape variant had popped up, Read says. But the variant has not spread around the world: It is able to infect close contacts of an infected person, but then peters out. The virus apparently faces a trade-off between transmissibility and immune escape. Such trade-offs likely exist for SARS-CoV-2 as well. Some clues about SARS-CoV-2's future path may come from coronaviruses with a much longer history in humans: those that cause common colds. Some are known to reinfect people, but until recently it was unclear whether that's because immunity in recovered people wanes, or because the virus changes its surface to evade immunity. In a study published in April in PLOS Pathogens , Bloom and other researchers compared the ability of human sera taken at different times in the past decades to block virus isolated at the same time or later. They showed that the samples could neutralize strains of a coronavirus named 229E isolated around the same time, but weren't always effective against virus from 10 years or more later. The virus had evidently evolved to evade human immunity, but it had taken 10 years or more. “Immune escape conjures this catastrophic failure of immunity when it is really immune erosion,” Bloom says. “Right now it seems like SARS-CoV-2, at least in terms of antibody escape, is actually behaving a lot like coronavirus 229E.” Others are probing SARS-CoV-2 itself. In a preprint published this month, researchers tinkered with the virus to learn how much it has to change to evade the antibodies generated in vaccine recipients and recovered patients. They found that it took 20 changes to the spike protein to escape current antibody responses almost completely. That means the bar for complete escape is high, says one of the authors, virologist Paul Bieniasz of Rockefeller University. “But it's very difficult to look into a crystal ball and say whether that is going to be easy for the virus to acquire or not,” he says. “It seems plausible that true immune escape is hard,” concludes William Hanage of the Harvard T.H. Chan School of Public Health. “However, the counterargument is that natural selection is a hell of a problem solver and the virus is only beginning to experience real pressure to evade immunity.” And the virus has tricks up its sleeve. Coronaviruses are good at recombining, for instance, which could allow new variants to emerge suddenly by combining the genomes—and the properties—of two different variants. In pigs, recombination of a coronavirus named porcine epidemic diarrhea virus with attenuated vaccine strains of another coronavirus has led to more virulent variants of PEDV. “Given the biology of these viruses, recombination may well factor into the continuing evolution of SARS-CoV-2,” Korber says. Given all that uncertainty, it's worrisome that humanity hasn't done a great job of limiting the spread of SARS-CoV-2, says Eugene Koonin, a researcher at the U.S. National Center for Biotechnology Information. Some dangerous variants may only be possible if the virus hits on a very rare, winning combination of mutations, he says. It might have to replicate an astronomical number of times to get there. “But with all these millions of infected people, it may very well find that combination.” Indeed, Katzourakis adds, the past 20 months are a warning to never underestimate viral evolution. “Many still see Alpha and Delta as being as bad as things are ever going to get,” he says. “It would be wise to consider them as steps on a possible trajectory that may challenge our public health response further.” [1]: pending:yes


Malaria infection and severe disease risks in Africa

Science

Understanding how changes in community parasite prevalence alter the rate and age distribution of severe malaria is essential for optimizing control efforts. Paton et al. assessed the incidence of pediatric severe malaria admissions from 13 hospitals in East Africa from 2006 to 2020 (see the Perspective by Taylor and Slutsker). Each 25% increase in community parasite prevalence shifted hospital admissions toward younger children. Low rates of lifetime infections appeared to confer some immunity to severe malaria in very young children. Children under the age of 5 years thus need to remain a focus of disease prevention for malaria control. Science , abj0089, this issue p. [926][1]; see also abk3443, p. [855][2] The relationship between community prevalence of Plasmodium falciparum and the burden of severe, life-threatening disease remains poorly defined. To examine the three most common severe malaria phenotypes from catchment populations across East Africa, we assembled a dataset of 6506 hospital admissions for malaria in children aged 3 months to 9 years from 2006 to 2020. Admissions were paired with data from community parasite infection surveys. A Bayesian procedure was used to calibrate uncertainties in exposure (parasite prevalence) and outcomes (severe malaria phenotypes). Each 25% increase in prevalence conferred a doubling of severe malaria admission rates. Severe malaria remains a burden predominantly among young children (3 to 59 months) across a wide range of community prevalence typical of East Africa. This study offers a quantitative framework for linking malaria parasite prevalence and severe disease outcomes in children. [1]: /lookup/doi/10.1126/science.abj0089 [2]: /lookup/doi/10.1126/science.abk3443


Exploring the path of the variable resistance

Science

In handling computer hardware, the last thing anyone would like to do is expose electronic components to electrostatic discharges. Nevertheless, this is exactly an approach that researchers are taking toward faster and more energy-efficient computing. Inspired by the functions of neurons and synapses in the brain, resistive switching devices or “memristors” are being explored as building blocks for neuromorphic circuitry. In such devices, the resistance properties are durably altered by applying voltage pulses. On page 907 of this issue, del Valle et al. ([ 1 ][1]) have imaged the early stages of electric field–induced electronic breakdown and formation of a conducting filament in vanadium oxide. By doing this in a space- and time-resolved manner, the authors provide useful insight into the characteristic length and time scales involved. ![Figure][2] Moving toward neural networksGRAPHIC: KELLIE HOLOSKI/ SCIENCE Computing systems are commonly based on the Von Neumann architecture, in which the memory is physically separated from the logic circuitry. Data are continuously shuttled between these units. This process is time consuming and presents an important cause of energy dissipation. Both aspects become very noticeable in data-intensive applications, like training deep neural networks. Neural networks are composed of layers of neuron-like devices connected through synapses. The latter comprise weight factors that are adjusted in the training process. In conventional complementary metal-oxide semiconductor (CMOS)–based technology, the weights need to be fetched, adjusted, and put back into the memory in every learning step. In an alternative and ultimately more efficient approach, the weights are embodied in the hardware itself, and training implies an alteration of the physical properties of the synapse, similar to what happens in the brain. In a fully electronic implementation, this requires the ability to controllably adjust the electrical resistance of a material. This is achieved using the electric field–driven motion of defect states, such as oxygen vacancies and impurity atoms ([ 2 ][3]), which are resistive switching concepts used also in binary resistive random access memory (ReRAM). Alternatives involve thermally induced alterations of the crystallinity of the material ([ 3 ][4]) and organic memristors ([ 4 ][5]). A complication in many techniques is that they involve atomic displacements and reconfigurations, which can lead to a spread in device properties and fatigue. This problem is circumvented by exploiting tunable electronic and/or magnetic ordering phenomena. The Mott insulator VO2 is an attractive example, exhibiting a hysteretic resistive transition just above room temperature ([ 5 ][6]). Applying electric field pulses to the material in the high-resistive state creates a metallic filament with a conductance that depends on the pulse intensity and duration. Notably, the resistance can be programmed over several orders of magnitude. By studying thin film microdevices with various vanadium oxide stoichiometries, del Valle et al. found that the transition starts with resistance fluctuations and nucleation of the conducting filament in hotspots on a hundreds-of-nanoseconds time scale (see the figure). In an avalanche-like process, the filament subsequently grows, as a result of Joule heating, over a time scale of microseconds. The authors investigated the growth dynamics and the final width of the conducting filament, which depends on both the characteristics of the voltage pulse and the resistivities of the material in the insulating and conducting states. Inhomogeneities play an important role in triggering the transition and in the filament formation by focusing the current. These findings can help to optimize the switching processes—e.g., by deliberately incorporating nanoscopic elements that act as optimized hotspots. The storing of synaptic weights in the neural network hardware is an example of the upcoming in-memory computing paradigm, which aims to circumvent the Von Neumann bottleneck. The practical implementation of this is typically in the form of cross-bar arrays ([ 6 ][7]), with the current lines acting as the pre- and postsynaptic connections to the neurons. The variable conductance properties of the barrier materials encode for the synaptic weight. Using this setup, Ohm's law and Kirchhoff's circuit law are used for matrix-vector multiplications, which are a key processing step in neural network operation. Also, other data-intensive applications can benefit from outsourcing data processing from the logic units to the memory—large-scale database queries being one example ([ 7 ][8]). In addition to storing information, the switching of VO2 when exceeding a certain threshold voltage can also be used for the realization of the artificial neurons. Using a negative differential resistance that can be invoked in the resistive transition, Yi et al. have even demonstrated 23 different neuronal functionalities with VO2-based memristors ([ 8 ][9]). Spiking modes of neural network operation are facilitated by this, with further expected enhancements in energy efficiency. The optical reflectivity modulation, as studied by del Valle et al. , presents a coupling between the electronic and photonic domains. This allows, for example, for the storing of synaptic weights in a photonic processor—a principle recently used in a photonic tensor core accelerator using phase change materials ([ 9 ][10]). Future computer systems will likely comprise a heterogeneous mix of electronic, optical, and spintronic components, and efficient coupling between these domains will then be indispensable. The next stage in vanadium oxide memristor research will be to make the step from single resistive switching devices to functional network structures, like multilayer artificial neural networks, and to explore their operation. In this endeavor, other more exotic post–Von Neumann information processing concepts are also of interest ([ 10 ][11], [ 11 ][12]). The space- and time-resolved optical reflectometry technique as demonstrated by del Valle et al. will enable current pulses and associated resistance modulations passing through such networks to be monitored without interference—tracing, so to say, the path of the variable resistance. 1. [↵][13]1. J. del Valle et al ., Science 373, 907 (2021). [OpenUrl][14][Abstract/FREE Full Text][15] 2. [↵][16]1. R. Waser, 2. R. Dittmann, 3. G. Staikov, 4. K. Szot , Adv. Mater. 21, 2632 (2009). [OpenUrl][17] 3. [↵][18]1. I. Boybat et al ., Nat. Commun. 9, 2514 (2018). [OpenUrl][19][CrossRef][20][PubMed][21] 4. [↵][22]1. S. Goswami, 2. S. Goswami, 3. T. Venkatesan , Appl. Phys. Rev. 7, 021303 (2020). [OpenUrl][23] 5. [↵][24]1. T. Driscoll, 2. H.-T. Kim, 3. B.-G. Chae, 4. M. Di Ventra, 5. D. N. Basov , Appl. Phys. Lett. 95, 043503 (2009). [OpenUrl][25][CrossRef][26] 6. [↵][27]1. Q. Xia, 2. J. J. Yang , Nat. Mater. 18, 309 (2019). [OpenUrl][28][CrossRef][29][PubMed][30] 7. [↵][31]1. I. Giannopoulos et al ., Adv. Intell. Syst. 2, 2000141 (2020). [OpenUrl][32] 8. [↵][33]1. W. Yi et al ., Nat. Commun. 9, 4661 (2018). [OpenUrl][34][CrossRef][35][PubMed][36] 9. [↵][37]1. J. Feldmann et al ., Nature 589, 52 (2021). [OpenUrl][38] 10. [↵][39]1. M. Di Ventra, 2. F. L. Traversa , J. Appl. Phys. 123, 180901 (2018). [OpenUrl][40] 11. [↵][41]1. M. A. Nugent, 2. T. W. Molter , PLOS ONE 9, e85175 (2014). [OpenUrl][42] [1]: #ref-1 [2]: pending:yes [3]: #ref-2 [4]: #ref-3 [5]: #ref-4 [6]: #ref-5 [7]: #ref-6 [8]: #ref-7 [9]: #ref-8 [10]: #ref-9 [11]: #ref-10 [12]: #ref-11 [13]: #xref-ref-1-1 "View reference 1 in text" [14]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253Ddel%2BValle%26rft.auinit1%253DJ.%26rft.volume%253D373%26rft.issue%253D6557%26rft.spage%253D907%26rft.epage%253D911%26rft.atitle%253DSpatiotemporal%2Bcharacterization%2Bof%2Bthe%2Bfield-induced%2Binsulator-to-metal%2Btransition%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.abd9088%26rft_id%253Dinfo%253Apmid%252F34301856%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [15]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzMvNjU1Ny85MDciO3M6NDoiYXRvbSI7czoyMjoiL3NjaS8zNzMvNjU1Ny84NTQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [16]: #xref-ref-2-1 "View reference 2 in text" [17]: {openurl}?query=rft.jtitle%253DAdv.%2BMater.%26rft.volume%253D21%26rft.spage%253D2632%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [18]: #xref-ref-3-1 "View reference 3 in text" [19]: {openurl}?query=rft.jtitle%253DNat.%2BCommun.%26rft.volume%253D9%26rft.spage%253D2514%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fs41467-018-04933-y%26rft_id%253Dinfo%253Apmid%252F29955057%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [20]: /lookup/external-ref?access_num=10.1038/s41467-018-04933-y&link_type=DOI [21]: /lookup/external-ref?access_num=29955057&link_type=MED&atom=%2Fsci%2F373%2F6557%2F854.atom [22]: #xref-ref-4-1 "View reference 4 in text" [23]: {openurl}?query=rft.jtitle%253DAppl.%2BPhys.%2BRev.%26rft.volume%253D7%26rft.spage%253D021303%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [24]: #xref-ref-5-1 "View reference 5 in text" [25]: {openurl}?query=rft.jtitle%253DAppl.%2BPhys.%2BLett.%26rft.volume%253D95%26rft.spage%253D043503%26rft_id%253Dinfo%253Adoi%252F10.1063%252F1.3187531%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [26]: /lookup/external-ref?access_num=10.1063/1.3187531&link_type=DOI [27]: #xref-ref-6-1 "View reference 6 in text" [28]: {openurl}?query=rft.jtitle%253DNat.%2BMater.%26rft.volume%253D18%26rft.spage%253D309%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fs41563-019-0291-x%26rft_id%253Dinfo%253Apmid%252F30894760%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [29]: /lookup/external-ref?access_num=10.1038/s41563-019-0291-x&link_type=DOI [30]: /lookup/external-ref?access_num=30894760&link_type=MED&atom=%2Fsci%2F373%2F6557%2F854.atom [31]: #xref-ref-7-1 "View reference 7 in text" [32]: {openurl}?query=rft.jtitle%253DAdv.%2BIntell.%2BSyst.%26rft.volume%253D2%26rft.spage%253D2000141%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [33]: #xref-ref-8-1 "View reference 8 in text" [34]: {openurl}?query=rft.jtitle%253DNat.%2BCommun.%26rft.volume%253D9%26rft.spage%253D4661%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fs41467-018-07052-w%26rft_id%253Dinfo%253Apmid%252F30405124%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [35]: /lookup/external-ref?access_num=10.1038/s41467-018-07052-w&link_type=DOI [36]: /lookup/external-ref?access_num=30405124&link_type=MED&atom=%2Fsci%2F373%2F6557%2F854.atom [37]: #xref-ref-9-1 "View reference 9 in text" [38]: {openurl}?query=rft.jtitle%253DNature%26rft.volume%253D589%26rft.spage%253D52%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [39]: #xref-ref-10-1 "View reference 10 in text" [40]: {openurl}?query=rft.jtitle%253DJ.%2BAppl.%2BPhys.%26rft.volume%253D123%26rft.spage%253D180901%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [41]: #xref-ref-11-1 "View reference 11 in text" [42]: {openurl}?query=rft.jtitle%253DPLOS%2BONE%26rft.volume%253D9%26rft.spage%253De85175%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx


Ecology in the age of automation

Science

The accelerating pace of global change is driving a biodiversity extinction crisis ([ 1 ][1]) and is outstripping our ability to track, monitor, and understand ecosystems, which is traditionally the job of ecologists. Ecological research is an intensive, field-based enterprise that relies on the skills of trained observers. This process is both time-consuming and expensive, thus limiting the resolution and extent of our knowledge of the natural world. Although technology will never replace the intuition and breadth of skills of the experienced naturalist ([ 2 ][2]), ecologists cannot ignore the potential to greatly expand the scale of our studies through automation. The capacity to automate biodiversity sampling is being driven by three ongoing technological developments: the commoditization of small, low-power computing devices; advances in wireless communications; and an explosion in automated data-recognition algorithms in the field of machine learning. Automated data collection and machine learning are set to revolutionize in situ studies of natural systems. Automation has swept across all human endeavors over recent decades, and science is no exception. The extent of ecological observation has traditionally been limited by the costs of manual data collection. We envision a future in which data from field studies are augmented with continuous, fine-scale, remotely sensed data recording the presence, behavior, and other properties of individual organisms. As automation drives down costs of these networks, there will not be a simple expansion of the quantity of data. Rather, the potential high resolution and broad extent of these data will lead to qualitatively new findings and will result in new discoveries about the natural world that will enable ecologists to better predict and manage changing ecosystems ([ 3 ][3]). This will be especially true as different types of sensing networks, including mobile elements such as drones, are connected together to provide a rich, multidimensional view of nature. Given the role that biodiversity plays in lending resilience to the ecosystems on which humans depend ([ 4 ][4]), monitoring the distribution and abundance of species along with climate and other variables is a critical need in developing ecological hypotheses and for adapting to emerging global challenges. Ecosystems are alive with sound and motion that can be captured with audio and video sensors. Rapid advances in audio and video classification algorithms can allow the recognition of species and labeling of complex traits and behaviors, which were traditionally the domain of manual species identification by experts. The major advance has been the discovery of deep convolutional neural networks ([ 5 ][5]). These algorithms extract fundamental aspects of contrast and shape in a manner analogous to how we and other animals recognize objects in our visual field. Applied to audio signals, these neural networks are highly effective at classifying natural and anthropogenic sounds ([ 6 ][6]). A canonical example is the classification of bird songs. Other acoustic examples include insects, amphibians, and disturbance indicators such as chainsaws. Naturally, these algorithms also lend themselves to species identification from images and videos. In cases of animals displaying complex color patterns, individuals may be distinguished, allowing minimally invasive mark recapture, an important tool in population studies and conservation ([ 7 ][7]). Beyond sight and sound, sensors can target a wide range of physical, chemical, and biological phenomena. Particularly intriguing is the possibility for widespread environmental sensing of biomolecular compounds that could, for example, allow quantification of “DNA-scapes” by means of laboratory-on-a-chip–type sensors ([ 8 ][8]). Several technological trends are shaping the emergence of large-scale sensor networks. One is the ongoing miniaturization of technology, allowing deployment of extended arrays of low-power sensor devices across landscapes [for example, ([ 9 ][9])]. In many cases, these can be solar-powered in remote locations. The widespread availability of computer-on-a-chip devices along with various attached sensors is enabling the construction of large distributed sensing networks at price points that were formerly unattainable. Similarly, the ubiquitous availability of cloud-based computing and storage for back-end processing is facilitating large-scale deployments. Another trend is advancements in wireless communications. For example, the emerging internet of things ([ 10 ][10]) enables low-power devices to establish ad hoc mesh networks that can pass information from node to node, eventually reaching points of aggregation and analysis. The same technology used to connect smart doorbells and lightbulbs can be leveraged to move data across sensor networks distributed across a landscape. These protocols are designed for low power consumption but may not have sufficient bandwidth for all applications. An alternative, although more power hungry, is cellular technology, which has increasing coverage globally. In remote locations, where commercial cellular data services may not be available, researchers can consider a private cellular network for on-site telemetry and satellite uplinks for internet streaming. However, in the near term, telecommunications costs and per-device power requirements may nonetheless prove prohibitive in certain high-bandwidth applications, such as video and audio streaming. An alternative for sites where communications bandwidth is limited by cost, isolation, or power constraints is edge computing ([ 11 ][11]). In this design, computation is moved to the sensing devices themselves, which then transmit filtered or classified results for analysis, greatly reducing transmission requirements. One more trend is the advancement of machine-learning methods ([ 12 ][12]) that can classify and extract patterns from data streams. Much of this technology has been commoditized through intensive development efforts in the technology sector that have resulted in widely available software libraries usable by nonexperts. The aforementioned convolutional neural networks can be coded both to segment data into units and to label these units with appropriate classes. The major bottleneck is in training classifiers because initial training inputs must be labeled manually by experts. Although labeled training sets exist in some domains—most notably, image recognition—future analysts may be able to skip much of the training step as large collections of pretrained networks become available. These pretrained networks can be combined and modified for specific tasks without the requirement of comprehensive training sets. Of particular interest from the standpoint of automation are new developments in continual learning ([ 13 ][13]), in which networks adjust in response to changing inputs. This holds the promise of automating model adaptation for detecting emerging phenomena, such as species shifting their ranges in response to climate change or other shifts in ecosystem properties. Ecologists could leverage these developments to create automated sensing networks at scales previously unimaginable. As an example, consider the North American Breeding Bird Survey, a highly successful citizen-science initiative running since the late 1960s with continental-scale coverage. Expert observers conduct point counts of birds along routes, generating data that have proved invaluable in tracking trends in songbird populations ([ 14 ][14]). Although we hope to see such efforts continue, imagine what could be learned if, instead of sampling these communities once per year, a long-term, continental-scale songbird observatory could be constructed to record and classify bird vocalizations in near–real time along with environmental covariates. Similar networks could use camera traps or video streams to reveal details of diurnal and seasonal variation across diverse floras and faunas. As with all sampling methods, sensing networks will not be without biases in sensitivity and discrimination, yet they hold the extraordinary promise of regional sampling of biodiversity at the organismal scale, something that has proven difficult, for example, by using traditional satellite-based remote sensing. These efforts would complement ongoing development of continental-scale observatories in ecology [for example, ([ 15 ][15])] by increasing the spatial and temporal resolution of sampling. 1. [↵][16]1. S. Díaz et al ., Science 366, eaax3100 (2019). [OpenUrl][17][Abstract/FREE Full Text][18] 2. [↵][19]1. J. Travis , Am. Nat. 196, 1 (2020). [OpenUrl][20] 3. [↵][21]1. M. C. Dietze et al ., Proc. Natl. Acad. Sci. U.S.A. 115, 1424 (2018). [OpenUrl][22][Abstract/FREE Full Text][23] 4. [↵][24]1. B. J. Cardinale et al ., Nature 486, 59 (2012). [OpenUrl][25][CrossRef][26][PubMed][27][Web of Science][28] 5. [↵][29]1. Y. LeCun, 2. Y. Bengio, 3. G. Hinton , Nature 521, 436 (2015). [OpenUrl][30][CrossRef][31][PubMed][32] 6. [↵][33]1. S. S. Sethi et al ., Proc. Natl. Acad. Sci. U.S.A. 117, 17049 (2020). [OpenUrl][34][Abstract/FREE Full Text][35] 7. [↵][36]1. R. C. Whytock et al ., Methods Ecol. Evol. 12, 1080 (2021). [OpenUrl][37] 8. [↵][38]1. B. C. Dhar, 2. N. Y. Lee , Biochip J. 12, 173 (2018). [OpenUrl][39] 9. [↵][40]1. A. P. Hill et al ., Methods Ecol. Evol. 9, 1199 (2018). [OpenUrl][41] 10. [↵][42]1. L. Atzori, 2. A. Iera, 3. G. Morabito , Comput. Netw. 54, 2787 (2010). [OpenUrl][43][CrossRef][44][Web of Science][45] 11. [↵][46]1. W. Shi, 2. J. Cao, 3. Q. Zhang, 4. Y. Li, 5. L. Xu , IEEE Internet Things J. 3, 637 (2016). [OpenUrl][47] 12. [↵][48]1. M. I. Jordan, 2. T. M. Mitchell , Science 349, 255 (2015). [OpenUrl][49][Abstract/FREE Full Text][50] 13. [↵][51]1. R. Aljundi, 2. K. Kelchtermans, 3. T. Tuytelaars , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11254–11263. 14. [↵][52]1. J. R. Sauer, 2. W. A. Link, 3. J. E. Fallon, 4. K. L. Pardieck, 5. D. J. Ziolkowski Jr. , N. Am. Fauna 79, 1 (2013). [OpenUrl][53] 15. [↵][54]1. M. Keller, 2. D. S. Schimel, 3. W. W. Hargrove, 4. F. M. Hoffman , Front. Ecol. Environ. 6, 282 (2008). [OpenUrl][55][CrossRef][56] Acknowledgments: Our perspective on autonomous sensing was developed with the support of the Stengl-Wyer Endowment and the Office of the Vice President for Research Bridging Barriers programs at the University of Texas at Austin, and the National Science Foundation (BCS-2009669). Comments from members of the Keitt laboratory, Planet Texas 2050, A. Wolf, and M. Abelson were invaluable in refining our ideas. [1]: #ref-1 [2]: #ref-2 [3]: #ref-3 [4]: #ref-4 [5]: #ref-5 [6]: #ref-6 [7]: #ref-7 [8]: #ref-8 [9]: #ref-9 [10]: #ref-10 [11]: #ref-11 [12]: #ref-12 [13]: #ref-13 [14]: #ref-14 [15]: #ref-15 [16]: #xref-ref-1-1 "View reference 1 in text" [17]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DDiaz%26rft.auinit1%253DS.%26rft.volume%253D366%26rft.issue%253D6471%26rft.spage%253Deaax3100%26rft.epage%253Deaax3100%26rft.atitle%253DPervasive%2Bhuman-driven%2Bdecline%2Bof%2Blife%2Bon%2BEarth%2Bpoints%2Bto%2Bthe%2Bneed%2Bfor%2Btransformative%2Bchange%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.aax3100%26rft_id%253Dinfo%253Apmid%252F31831642%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [18]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjE3OiIzNjYvNjQ3MS9lYWF4MzEwMCI7czo0OiJhdG9tIjtzOjIyOiIvc2NpLzM3My82NTU3Lzg1OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30= [19]: #xref-ref-2-1 "View reference 2 in text" [20]: {openurl}?query=rft.jtitle%253DAm.%2BNat.%26rft.volume%253D196%26rft.spage%253D1%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [21]: #xref-ref-3-1 "View reference 3 in text" [22]: {openurl}?query=rft.jtitle%253DProc.%2BNatl.%2BAcad.%2BSci.%2BU.S.A.%26rft_id%253Dinfo%253Adoi%252F10.1073%252Fpnas.1710231115%26rft_id%253Dinfo%253Apmid%252F29382745%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [23]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMDoiMTE1LzcvMTQyNCI7czo0OiJhdG9tIjtzOjIyOiIvc2NpLzM3My82NTU3Lzg1OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30= [24]: #xref-ref-4-1 "View reference 4 in text" [25]: {openurl}?query=rft.jtitle%253DNature%26rft.stitle%253DNature%26rft.aulast%253DCardinale%26rft.auinit1%253DB.%2BJ.%26rft.volume%253D486%26rft.issue%253D7401%26rft.spage%253D59%26rft.epage%253D67%26rft.atitle%253DBiodiversity%2Bloss%2Band%2Bits%2Bimpact%2Bon%2Bhumanity.%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fnature11148%26rft_id%253Dinfo%253Apmid%252F22678280%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [26]: /lookup/external-ref?access_num=10.1038/nature11148&link_type=DOI [27]: /lookup/external-ref?access_num=22678280&link_type=MED&atom=%2Fsci%2F373%2F6557%2F858.atom [28]: /lookup/external-ref?access_num=000304854000027&link_type=ISI [29]: #xref-ref-5-1 "View reference 5 in text" [30]: {openurl}?query=rft.jtitle%253DNature%26rft.volume%253D521%26rft.spage%253D436%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fnature14539%26rft_id%253Dinfo%253Apmid%252F26017442%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [31]: /lookup/external-ref?access_num=10.1038/nature14539&link_type=DOI [32]: /lookup/external-ref?access_num=26017442&link_type=MED&atom=%2Fsci%2F373%2F6557%2F858.atom [33]: #xref-ref-6-1 "View reference 6 in text" [34]: {openurl}?query=rft.jtitle%253DProc.%2BNatl.%2BAcad.%2BSci.%2BU.S.A.%26rft_id%253Dinfo%253Adoi%252F10.1073%252Fpnas.2004702117%26rft_id%253Dinfo%253Apmid%252F32636258%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [35]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE3LzI5LzE3MDQ5IjtzOjQ6ImF0b20iO3M6MjI6Ii9zY2kvMzczLzY1NTcvODU4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [36]: #xref-ref-7-1 "View reference 7 in text" [37]: {openurl}?query=rft.jtitle%253DMethods%2BEcol.%2BEvol.%26rft.volume%253D12%26rft.spage%253D1080%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [38]: #xref-ref-8-1 "View reference 8 in text" [39]: {openurl}?query=rft.jtitle%253DBiochip%2BJ.%26rft.volume%253D12%26rft.spage%253D173%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [40]: #xref-ref-9-1 "View reference 9 in text" [41]: {openurl}?query=rft.jtitle%253DMethods%2BEcol.%2BEvol.%26rft.volume%253D9%26rft.spage%253D1199%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [42]: #xref-ref-10-1 "View reference 10 in text" [43]: {openurl}?query=rft.jtitle%253DComput.%2BNetw.%26rft.volume%253D54%26rft.spage%253D2787%26rft_id%253Dinfo%253Adoi%252F10.1016%252Fj.comnet.2010.05.010%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [44]: /lookup/external-ref?access_num=10.1016/j.comnet.2010.05.010&link_type=DOI [45]: /lookup/external-ref?access_num=000283039900014&link_type=ISI [46]: #xref-ref-11-1 "View reference 11 in text" [47]: {openurl}?query=rft.jtitle%253DIEEE%2BInternet%2BThings%2BJ.%26rft.volume%253D3%26rft.spage%253D637%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [48]: #xref-ref-12-1 "View reference 12 in text" [49]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DJordan%26rft.auinit1%253DM.%2BI.%26rft.volume%253D349%26rft.issue%253D6245%26rft.spage%253D255%26rft.epage%253D260%26rft.atitle%253DMachine%2Blearning%253A%2BTrends%252C%2Bperspectives%252C%2Band%2Bprospects%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.aaa8415%26rft_id%253Dinfo%253Apmid%252F26185243%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [50]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNDkvNjI0NS8yNTUiO3M6NDoiYXRvbSI7czoyMjoiL3NjaS8zNzMvNjU1Ny84NTguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [51]: #xref-ref-13-1 "View reference 13 in text" [52]: #xref-ref-14-1 "View reference 14 in text" [53]: {openurl}?query=rft.jtitle%253DN.%2BAm.%2BFauna%26rft.volume%253D79%26rft.spage%253D1%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [54]: #xref-ref-15-1 "View reference 15 in text" [55]: {openurl}?query=rft.jtitle%253DFront.%2BEcol.%2BEnviron.%26rft.volume%253D6%26rft.spage%253D282%26rft.atitle%253DFRONT%2BECOL%2BENVIRON%26rft_id%253Dinfo%253Adoi%252F10.1890%252F1540-9295%25282008%25296%255B282%253AACSFTN%255D2.0.CO%253B2%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [56]: /lookup/external-ref?access_num=10.1890/1540-9295(2008)6[282:ACSFTN]2.0.CO;2&link_type=DOI


Banking on protein structural data

Science

In 1953, the proposed structure of DNA magnificently linked biological function and structure. By contrast, 4 years later, the first elucidation of the structure of a protein—myoglobin, by Kendrew and colleagues—revealed an inelegant shape, described disdainfully as a “visceral knot.” Additional complexity, as well as some general principles, was revealed as more protein structures were solved over the next decade. In 1971, scientists at Brookhaven National Laboratory launched the Protein Data Bank (PDB) as a repository to collect and make available the atomic coordinates of structures (seven at the time) to interested parties. The PDB now includes more than 180,000 structures, and this resource has fueled an incalculable number of advances, including the recent development of powerful structure prediction tools. Biology takes place in three dimensions, yet most biological information is stored in one-dimensional sequences of DNA that encode the amino acid sequences of proteins. The transition from one to three dimensions is accomplished through the spontaneous folding of a sequence of amino acids into a folded protein structure. Comparing elucidated structures revealed that proteins that are at least 30% identical in amino acid sequence almost always have the same folded structure; evolutionarily, structure is much more conserved than sequence. Conversely, some short stretches of five or more amino acids can adopt completely different structures; structure is context dependent. Thus, the relationship between sequence and structure is not a simple one. Predicting protein structures from sequences has been a grand challenge for decades. By 1994, fueled by the explosion of sequences, biophysicist John Moult and colleagues organized the first Critical Assessment of Structure Prediction (CASP) meeting. CASP is based on blinded assessments, which are common in clinical trials. Sequences of proteins whose structures had been determined but not publicly shared were made available to would-be predictors to develop and submit structural predictions for subsequent independent assessment. The first CASP meeting was somewhat depressing because the results revealed that predictors were doing substantially worse than they thought. CASP meetings have continued every 2 years and have driven the field forward through feedback and competition. The most recent CASP meeting, in November 2020, was shaken by results from the company DeepMind. Its AlphaFold program performed substantially better than other programs had in the past, producing many results that are of similar quality to that of experimental structures. The RoseTTA-Fold program, developed by the laboratory of structural biologist David Baker, builds on this laboratory’s previous work, combined with insights from the DeepMind success (see page 871). The results of both programs are sufficiently good that many are claiming that these represent relatively general (but certainly not perfect, and incomplete) solutions to the structure prediction problem. Notably, both groups have provided their computer code for their methods for others to use, test, and enhance. These programs are based on deep-learning artificial intelligence methods. Such approaches depend on the availability of many thousands of questions with known answers to train the neural networks at their core. Thus, without the sequences with known structures from structural biologists from around the world shared in the PDB, these approaches would not have been feasible. The teams that developed these powerful programs deserve great credit for their accomplishments, but these stand on a foundation of the results from billions of dollars of public fund investments in structural biology and the sustained support of the PDB from around the world (now overseen by the Worldwide PDB). Policies from funders, publishers, and the scientific community have led to requirements that reported structures be promptly deposited in the PDB. As someone who has interacted with the PDB as a consumer, a contributor, a policy-maker, and a funder, I have experienced the power and challenges of trying to optimize such a public resource. The cultural shifts, at the cutting (and often bleeding) edge of open science, were often controversial, but it is hard to argue that they have not both increased the impact of individual determined structures and accelerated scientific progress in many ways. The ever-growing PDB provides researchers with a universe of structures with which to compare their favorite structures. The new structure prediction tools expand this universe further and provide truly compelling evidence of the power of open science. Moreover, these tools bring truth to an old saying in structural biology circles, “The structure prediction problem has been solved; it’s hiding in the PDB.”