Goto

Collaborating Authors

Results


Phylodynamics for cell biologists

Science

Advances in experimental approaches for single-cell analysis allow in situ sequencing, genomic barcoding, and mapping of cell lineages within tissues and organisms. Large amounts of data have thus accumulated and present an analytical challenge. Stadler et al. recognized the need for conceptual and computational approaches to fully exploit these technological advances for the understanding of normal and disease states. The authors review ideas taken from phylodynamics of infectious disease and show how similar tree-building techniques can be applied to monitoring changes in somatic cell lineages for applications ranging from development and differentiation to cancer biology. Science , this issue p. [eaah6266][1] ### BACKGROUND The birth, death, and diversification of individuals are events that drive biological processes across all scales. This is true whether the individuals in question represent nucleic acids, cells, whole organisms, populations, or species. The ancestral relationships of individuals can be visualized as branching trees or phylogenies, which are long-established representations in the fields of evolution, ecology, and epidemiology. Molecular phylogenetics is the discipline concerned with the reconstruction of such trees from gene or genome sequence data. The shape and size of such phylogenies depend on the past birth and death processes that generated them, and in phylodynamics, mathematical models are used to infer and quantify the dynamical behavior of biological populations from ancestral relationships. New technological advances in genetics and cell biology have led to a growing body of data about the molecular state and ancestry of individual cells in multicellular organisms. Ideas from phylogenetics and phylodynamics are being applied to these data to investigate many questions in tissue formation and tumorigenesis. ### ADVANCES Trees offer a valuable framework for tracing cell division and change through time, beginning with individual ancestral stem cells or fertilized eggs and resulting in complex tissues, tumors, or whole organisms (see the figure). They also provide the basis for computational and statistical methods with which to analyze data from cell biology. Our Review explains how “tree-thinking” and phylodynamics can be beneficial to the interpretation of empirical data pertaining to the individual cells of multicellular organisms. We summarize some recent research questions in developmental and cancer biology and briefly introduce the new technologies that allow us to observe the spatiotemporal histories of cell division and change. We provide an overview of the various and sometimes confusing ways in which graphical models, based on or represented by trees, have been applied in cell biology. To provide conceptual clarity, we outline four distinct graphical representations of the history of cell division and differentiation in multicellular organisms. We highlight that cells from an organism cannot be always treated as statistically independent observations but instead are often correlated because of phylogenetic history, and we explain how this can cause difficulties when attempting to infer dynamical behavior from experimental single-cell data. We introduce simple ecological null models for cell populations and illustrate some potential pitfalls in hypothesis testing and the need for quantitative phylodynamic models that explicitly incorporate the dependencies caused by shared ancestry. ### OUTLOOK We expect the rapid growth in the number of cell-level phylogenies to continue, a trend enhanced by ongoing technological advances in cell lineage tracing, genomic barcoding, and in situ sequencing. In particular, we anticipate the generation of exciting datasets that combine phenotypic measurements for individual cells (such as through transcriptome sequencing) with high-resolution reconstructions of the ancestry of the sampled cells. These developments will offer new ways to study developmental, oncogenic, and immunological processes but will require new and appropriate conceptual and computational tools. We discuss how models from phylogenetics and phylodynamics will benefit the interpretation of the data sets generated in the foreseeable future and will aid the development of statistical tests that exploit, and are robust to, cell shared ancestry. We hope that our discussion will initiate the integration of cell-level phylodynamic approaches into experimental and theoretical studies of development, cancer, and immunology. We sketch out some of the theoretical advances that will be required to analyze complex spatiotemporal cell dynamics and encourage explorations of these new directions. Powerful new statistical and computational tools are essential if we are to exploit fully the wealth of new experimental data being generated in cell biology. ![Figure][2] Multicellular organisms develop from a single fertilized egg. The division, apoptosis, and differentiation of cells can be displayed in a development tree, with the fertilized egg being the root of the tree. The development of any particular tissue within an organism can be traced as a subtree of the full developmental tree. Subtrees that represent cancer tumors or B cell clones may exhibit rapid growth and genetic change. Here, we illustrate the developmental tree of a human and expand the subtree representing haematopoiesis (blood formation) in the bone marrow. Stem cells in the bone marrow differentiate, giving rise to the numerous blood cell types in humans. The structure of the tree that underlies haematopoiesis and the formation of all tissues is unclear. Phylogenetic and phylodynamic tools can help to describe and statistically explore questions about this cell differentiation process. Multicellular organisms are composed of cells connected by ancestry and descent from progenitor cells. The dynamics of cell birth, death, and inheritance within an organism give rise to the fundamental processes of development, differentiation, and cancer. Technical advances in molecular biology now allow us to study cellular composition, ancestry, and evolution at the resolution of individual cells within an organism or tissue. Here, we take a phylogenetic and phylodynamic approach to single-cell biology. We explain how “tree thinking” is important to the interpretation of the growing body of cell-level data and how ecological null models can benefit statistical hypothesis testing. Experimental progress in cell biology should be accompanied by theoretical developments if we are to exploit fully the dynamical information in single-cell data. [1]: /lookup/doi/10.1126/science.aah6266 [2]: pending:yes


Artificial Intelligence, Machine Learning To Play an important Role In Fight Against COVID, Say Experts

#artificialintelligence

Artificial intelligence (AI) and machine learning are helping analyse enormous amounts of data around the human genome and drug molecules, and these new-age technologies can play an important role in the battle against COVID-19, industry experts said on Saturday. Speaking at KnowDis Machine Learning Day, Avantika Lal – Senior Scientist (Deep Learning and Genomics) at NVIDIA – stated bigger data sets on genome sequences (DNA arrangement) are being obtained, and this data is being studied for multiple parameters. "As the cost of sequencing goes down, more and more people can get their genome sequence and in actuality, governments, research institutes and public health organisations around the world are attempting to sequence many thousands of people so as to be develop an idea of the genomes of the inhabitants of the countries," she said. Lal added that enormous data sets are collected that are extremely complicated and contain many different related sorts of information. These data sets may also help understand the mechanisms by which a specific disorder arises in people, or how does one identify patients who may respond differently or become more sensitive to a particular kind of medication or treatment, she further said.


AI, machine learning to play key role in fight against Covid-19, say experts

#artificialintelligence

Artificial intelligence (AI) and machine learning are helping analyse massive amounts of data around the human genome and drug molecules, and these new-age technologies can play an important role in the fight against Covid-19, industry experts said on Saturday. Speaking at KnowDis Machine Learning Day, Avantika Lal - Senior Scientist (Deep Learning and Genomics) at NVIDIA - said larger data sets on genome sequences (DNA arrangement) are being acquired, and this data is being studied for multiple parameters. "As the cost of sequencing goes down, more and more people can get their genome sequence and in fact, governments, research institutes and public health organisations around the world are trying to sequence many thousands of people in order to be build up an idea of the genomes of the populations of their countries," she said. Lal added that massive data sets are collected that are very complicated and contain many different related kinds of information. "...the size and richness of the data sets that we're now getting in this field makes it really essential to use machine learning and deep learning to analyze this data in order to answer complicated questions like, for example, how do we identify people who are more at risk of developing various diseases before they actually develop signs of those diseases," she said.


AI, machine learning to play key role in fight against COVID, say experts

#artificialintelligence

Speaking at KnowDis Machine Learning Day, Avantika Lal - Senior Scientist (Deep Learning and Genomics) at NVIDIA - said larger data sets on genome sequences (DNA arrangement) are being acquired, and this data is being studied for multiple parameters. "As the cost of sequencing goes down, more and more people can get their genome sequence and in fact, governments, research institutes and public health organisations around the world are trying to sequence many thousands of people in order to be build up an idea of the genomes of the populations of their countries," she said. Lal added that massive data sets are collected that are very complicated and contain many different related kinds of information. "...the size and richness of the data sets that we''re now getting in this field makes it really essential to use machine learning and deep learning to analyze this data in order to answer complicated questions like, for example, how do we identify people who are more at risk of developing various diseases before they actually develop signs of those diseases," she said. These data sets can also help understand the mechanisms by which a certain disease arises in people, or how does one identify patients who might respond differently or be more sensitive to a particular kind of drug or treatment, she further said.


Align-gram : Rethinking the Skip-gram Model for Protein Sequence Analysis

arXiv.org Artificial Intelligence

Background: The inception of next generations sequencing technologies have exponentially increased the volume of biological sequence data. Protein sequences, being quoted as the `language of life', has been analyzed for a multitude of applications and inferences. Motivation: Owing to the rapid development of deep learning, in recent years there have been a number of breakthroughs in the domain of Natural Language Processing. Since these methods are capable of performing different tasks when trained with a sufficient amount of data, off-the-shelf models are used to perform various biological applications. In this study, we investigated the applicability of the popular Skip-gram model for protein sequence analysis and made an attempt to incorporate some biological insights into it. Results: We propose a novel $k$-mer embedding scheme, Align-gram, which is capable of mapping the similar $k$-mers close to each other in a vector space. Furthermore, we experiment with other sequence-based protein representations and observe that the embeddings derived from Align-gram aids modeling and training deep learning models better. Our experiments with a simple baseline LSTM model and a much complex CNN model of DeepGoPlus shows the potential of Align-gram in performing different types of deep learning applications for protein sequence analysis.


Conformance Checking for a Medical Training Process Using Petri net Simulation and Sequence Alignment

arXiv.org Artificial Intelligence

Process Mining has recently gained popularity in healthcare due to its potential to provide a transparent, objective and data-based view on processes. Conformance checking is a sub-discipline of process mining that has the potential to answer how the actual process executions deviate from existing guidelines. In this work, we analyze a medical training process for a surgical procedure. Ten students were trained to install a Central Venous Catheters (CVC) with ultrasound. Event log data was collected directly after instruction by the supervisors during a first test run and additionally after a subsequent individual training phase. In order to provide objective performance measures, we formulate an optimal, global sequence alignment problem inspired by approaches in bioinformatics. Therefore, we use the Petri net model representation of the medical process guideline to simulate a representative set of guideline conform sequences. Next, we calculate the optimal, global sequence alignment of the recorded and simulated event logs. Finally, the output measures and visualization of aligned sequences are provided for objective feedback.


Unlocking the Mysteries of the Brain With AutoML

#artificialintelligence

The phrase "it's not rocket science" harkens to the wild complexity of rocket building, with millions of pieces and as many opportunities to make errors. That being said, the brain has nearly 100 billion neurons, each of which acts like a "mini-computer." Not all neurons are inter-connected, but there are still around 100 trillion connections. How could we ever understand such a complex computer? Well, humans have been trying for thousands of years, but a "brain code" is yet to be discovered.


Tempus fugit: How time flies during development

Science

“Fugit irreparabile tempus,” wrote Virgil, a reminder that our lives are defined by the irreversible flow of time. As soon as the egg is fertilized, embryonic cells follow a developmental program strictly organized in time. The sequence typically is conserved throughout evolution, but individual events can occur over species-specific time scales. Such differences can have marked effects. For instance, it takes 3 months to generate cerebral cortex neurons in a human but only 1 week in a mouse. This prolonged neurogenesis likely contributes to evolutionary expansion of the human brain ([ 1 ][1]). But the mechanisms underlying developmental time scales remain largely unknown. On pages 1449 and 1450 of this issue, Rayon et al. ([ 2 ][2]) and Matsuda et al. ([ 3 ][3]), respectively, report an association between species-specific developmental time scales and the speed of biochemical reactions that support protein turnover. Cell differentiation during mammalian development uses two types of timing mechanisms (biological clocks) based on oscillations or unidirectional processes (hourglass clocks). Modeling development in pluripotent stem cells (PSCs) from various species shows that the pace of differentiation of many cell types in an in vitro setting largely recapitulates the species-specific timing observed in embryos ([ 4 ][4], [ 5 ][5]). Even when human neurons are transplanted as single cells in a mouse brain, they follow their own prolonged developmental timeline ([ 6 ][6]). This suggests that cell-intrinsic mechanisms, yet to be discovered, dictate the timing of developmental trajectories in a species-specific manner. Matsuda et al. examined a biological rhythm typical of vertebrate embryos: the “somite segmentation clock,” by which the body is built segment (or somite) by segment, thanks to waves of expression of specific genes (oscillations) in presomitic mesodermal (PSM) cells. Using in vitro modeling with mouse and human PSCs, the authors examined waves of expression of HES7 (hes family bHLH transcription factor 7), a segmental-clock master gene. They found similar waves in PSM cells of both species, but the period of oscillations in human cells was ∼5 hours instead of 2 hours (as in mouse cells), consistent with another recent report ([ 7 ][7]). What might underlie such cell-intrinsic differences? Evolutionary divergence in developmental processes usually occurs as a result of changes in the gene regulatory networks (GRNs) that control them ([ 8 ][8]). The authors examined the GRN of segmental oscillations, and except for the period of oscillation, they found no obvious difference between human and mouse gene expression. They then swapped the mouse and human genome sequences containing the HES7 locus. The human HES7 gene transplanted in mouse cells displayed fast oscillations like the mouse gene, whereas the mouse gene transplanted in the human cells displayed slower, human-like oscillations (see the figure). Thus, even DNA cis-regulatory components of the GRN do not appear to dictate the time scale of HES7 oscillations. However, Matsuda et al. found important species-specific differences in a different mechanism: the speed of biochemical reactions leading to protein turnover (production and decay). Human cells displayed slower kinetics of protein expression (including “expression delays” related to RNA transcription, splicing, and translation) and a slower rate of protein decay, mostly related to degradation. Many examined parameters showed a twofold difference in mouse versus human cells, matching the time differences observed for the segmentation clock. ![Figure][9] Same events, distinct timingGRAPHIC: KELLIE HOLOSKI/ SCIENCE Rather than being dominated by clocklike oscillations, the developmental process is specified mostly by cell-fate transitions, by which embryonic cells gradually an d irreversibly become differentiated cells. Could it be that similar mechanisms regulate these hourglass-like timing events as well? Rayon et al. explored this notion using a motor neuron (MN) developmental model from mouse and human PSCs. Examination of MN development in vitro revealed that the underlying GRN is similar in both species, except that human motoneurogenesis takes 2.5 times longer in the human cell model versus the mouse. The authors then examined the influence of sonic hedgehog, the key morphogen that induces MN fate (by changing timing and intensity of the signal), and the MN-development master gene OLIG2 (oligodendrocyte transcription factor 2) (by inserting the human gene in mouse cells) but found no effects that explained the species-specific time differences. They then analyzed protein stability during MN development and found that the mean protein half-life was doubled in human cells compared with mouse cells, which is consistent with the findings of Matsuda et al. Both studies point to protein turnover as a potential source of variation in developmental time scales. Each group tested this hypothesis further by in silico modeling of their experimental systems, which predicted, in each case, a prominent influence of the delay in protein production and protein decay on developmental time scales. That protein turnover affects the timing of development is provocative and attractive but must be validated by experimental evidence for causal relationship between the two (by altering the production and decay of proteins and mRNA, and then examining the developmental time scale). Such experiments will also help to determine the respective contributions of expression delay versus protein decay, on which each study puts a somewhat different emphasis. The consistent results from both studies also raise questions about the mechanisms upstream of interspecies differences in protein turnover. Metabolism is an attractive candidate. Protein turnover requires a considerable amount of energy ([ 9 ][10]), and metabolic rewiring has emerged as a central instructor of cell fate transitions ([ 10 ][11]), although through epigenetic remodeling rather than changes in proteostasis. Another question is whether the same principles apply to developmental events that display more pronounced time scale differences. For example, GRN divergence might operate through specific genes that modulate the timing of human cortical neurogenesis ([ 11 ][12]). Furthermore, metabolism and protein turnover might display differences depending on the cell context or the specific protein involved. And known correlations between developmental timing, life span, and aging across species ([ 12 ][13]) might all be causally linked to differences in metabolism and protein turnover. 1. [↵][14]1. A. M. M. Sousa et al ., Cell 170, 226 (2017). [OpenUrl][15][CrossRef][16][PubMed][17] 2. [↵][18]1. T. Rayon et al ., Science 369, eaba7667 (2020). [OpenUrl][19][Abstract/FREE Full Text][20] 3. [↵][21]1. M. Matsuda et al ., Science 369, 1450 (2020). [OpenUrl][22][Abstract/FREE Full Text][23] 4. [↵][24]1. J. van den Ameelen et al ., Trends Neurosci. 37, 334 (2014). [OpenUrl][25][CrossRef][26][PubMed][27] 5. [↵][28]1. M. Ebisuya, 2. J. Briscoe , Development 145, dev164368 (2018). [OpenUrl][29][Abstract/FREE Full Text][30] 6. [↵][31]1. D. Linaro et al ., Neuron 104, 972 (2019). [OpenUrl][32] 7. [↵][33]1. M. Diaz-Cuadros et al ., Nature 580, 113 (2020). [OpenUrl][34][CrossRef][35][PubMed][36] 8. [↵][37]1. E. H. Davidson, 2. D. H. Erwin , Science 311, 796 (2006). [OpenUrl][38][Abstract/FREE Full Text][39] 9. [↵][40]1. J. Labbadia, 2. R. I. Morimoto , Annu. Rev. Biochem. 84, 435 (2015). [OpenUrl][41][CrossRef][42][PubMed][43] 10. [↵][44]1. N. Shyh-Chang et al ., Development 140, 2535 (2013). [OpenUrl][45][Abstract/FREE Full Text][46] 11. [↵][47]1. I. K. Suzuki et al ., Cell 173, 1370 (2018). [OpenUrl][48][CrossRef][49][PubMed][50] 12. [↵][51]1. A. A. Fushan et al ., Aging Cell 14, 352 (2015). [OpenUrl][52][CrossRef][53][PubMed][54] Acknowledgments: P.V. is funded by the European Research Council, Belgian Fonds Wetenschappelijk Onderzoek, Excellence of Science Research programme, AXA Research Fund, Belgian Queen Elizabeth Foundation, and Fondation Université Libre de Bruxelles. R.I. was supported by the Belgian Fonds de la Recherche Scientifique. [1]: #ref-1 [2]: #ref-2 [3]: #ref-3 [4]: #ref-4 [5]: #ref-5 [6]: #ref-6 [7]: #ref-7 [8]: #ref-8 [9]: pending:yes [10]: #ref-9 [11]: #ref-10 [12]: #ref-11 [13]: #ref-12 [14]: #xref-ref-1-1 "View reference 1 in text" [15]: {openurl}?query=rft.jtitle%253DCell%26rft.volume%253D170%26rft.spage%253D226%26rft_id%253Dinfo%253Adoi%252F10.1016%252Fj.cell.2017.06.036%26rft_id%253Dinfo%253Apmid%252F28708995%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [16]: /lookup/external-ref?access_num=10.1016/j.cell.2017.06.036&link_type=DOI [17]: /lookup/external-ref?access_num=28708995&link_type=MED&atom=%2Fsci%2F369%2F6510%2F1431.atom [18]: #xref-ref-2-1 "View reference 2 in text" [19]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DRayon%26rft.auinit1%253DT.%26rft.volume%253D369%26rft.issue%253D6510%26rft.spage%253Deaba7667%26rft.epage%253Deaba7667%26rft.atitle%253DSpecies-specific%2Bpace%2Bof%2Bdevelopment%2Bis%2Bassociated%2Bwith%2Bdifferences%2Bin%2Bprotein%2Bstability%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.aba7667%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [20]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjE3OiIzNjkvNjUxMC9lYWJhNzY2NyI7czo0OiJhdG9tIjtzOjIzOiIvc2NpLzM2OS82NTEwLzE0MzEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [21]: #xref-ref-3-1 "View reference 3 in text" [22]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DMatsuda%26rft.auinit1%253DM.%26rft.volume%253D369%26rft.issue%253D6510%26rft.spage%253D1450%26rft.epage%253D1455%26rft.atitle%253DSpecies-specific%2Bsegmentation%2Bclock%2Bperiods%2Bare%2Bdue%2Bto%2Bdifferential%2Bbiochemical%2Breaction%2Bspeeds%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.aba7668%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [23]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjkvNjUxMC8xNDUwIjtzOjQ6ImF0b20iO3M6MjM6Ii9zY2kvMzY5LzY1MTAvMTQzMS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30= [24]: #xref-ref-4-1 "View reference 4 in text" [25]: {openurl}?query=rft.jtitle%253DTrends%2BNeurosci.%26rft.volume%253D37%26rft.spage%253D334%26rft_id%253Dinfo%253Adoi%252F10.1016%252Fj.tins.2014.03.005%26rft_id%253Dinfo%253Apmid%252F24745669%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [26]: /lookup/external-ref?access_num=10.1016/j.tins.2014.03.005&link_type=DOI [27]: /lookup/external-ref?access_num=24745669&link_type=MED&atom=%2Fsci%2F369%2F6510%2F1431.atom [28]: #xref-ref-5-1 "View reference 5 in text" [29]: {openurl}?query=rft.jtitle%253DDevelopment%26rft.stitle%253DDevelopment%26rft.aulast%253DEbisuya%26rft.auinit1%253DM.%26rft.volume%253D145%26rft.issue%253D12%26rft.spage%253Ddev164368%26rft.epage%253Ddev164368%26rft.atitle%253DWhat%2Bdoes%2Btime%2Bmean%2Bin%2Bdevelopment%253F%26rft_id%253Dinfo%253Adoi%252F10.1242%252Fdev.164368%26rft_id%253Dinfo%253Apmid%252F29945985%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [30]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGV2ZWxvcCI7czo1OiJyZXNpZCI7czoxNjoiMTQ1LzEyL2RldjE2NDM2OCI7czo0OiJhdG9tIjtzOjIzOiIvc2NpLzM2OS82NTEwLzE0MzEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [31]: #xref-ref-6-1 "View reference 6 in text" [32]: {openurl}?query=rft.jtitle%253DNeuron%26rft.volume%253D104%26rft.spage%253D972%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [33]: #xref-ref-7-1 "View reference 7 in text" [34]: {openurl}?query=rft.jtitle%253DNature%26rft.volume%253D580%26rft.spage%253D113%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fs41586-019-1885-9%26rft_id%253Dinfo%253Apmid%252F31915384%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [35]: /lookup/external-ref?access_num=10.1038/s41586-019-1885-9&link_type=DOI [36]: /lookup/external-ref?access_num=31915384&link_type=MED&atom=%2Fsci%2F369%2F6510%2F1431.atom [37]: #xref-ref-8-1 "View reference 8 in text" [38]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DDavidson%26rft.auinit1%253DE.%2BH.%26rft.volume%253D311%26rft.issue%253D5762%26rft.spage%253D796%26rft.epage%253D800%26rft.atitle%253DGene%2BRegulatory%2BNetworks%2Band%2Bthe%2BEvolution%2Bof%2BAnimal%2BBody%2BPlans%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.1113832%26rft_id%253Dinfo%253Apmid%252F16469913%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [39]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMTEvNTc2Mi83OTYiO3M6NDoiYXRvbSI7czoyMzoiL3NjaS8zNjkvNjUxMC8xNDMxLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [40]: #xref-ref-9-1 "View reference 9 in text" [41]: {openurl}?query=rft.jtitle%253DAnnu.%2BRev.%2BBiochem.%26rft.volume%253D84%26rft.spage%253D435%26rft_id%253Dinfo%253Adoi%252F10.1146%252Fannurev-biochem-060614-033955%26rft_id%253Dinfo%253Apmid%252F25784053%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [42]: /lookup/external-ref?access_num=10.1146/annurev-biochem-060614-033955&link_type=DOI [43]: /lookup/external-ref?access_num=25784053&link_type=MED&atom=%2Fsci%2F369%2F6510%2F1431.atom [44]: #xref-ref-10-1 "View reference 10 in text" [45]: {openurl}?query=rft.jtitle%253DDevelopment%26rft.stitle%253DDevelopment%26rft.aulast%253DShyh-Chang%26rft.auinit1%253DN.%26rft.volume%253D140%26rft.issue%253D12%26rft.spage%253D2535%26rft.epage%253D2547%26rft.atitle%253DStem%2Bcell%2Bmetabolism%2Bin%2Btissue%2Bdevelopment%2Band%2Baging%26rft_id%253Dinfo%253Adoi%252F10.1242%252Fdev.091777%26rft_id%253Dinfo%253Apmid%252F23715547%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [46]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGV2ZWxvcCI7czo1OiJyZXNpZCI7czoxMToiMTQwLzEyLzI1MzUiO3M6NDoiYXRvbSI7czoyMzoiL3NjaS8zNjkvNjUxMC8xNDMxLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [47]: #xref-ref-11-1 "View reference 11 in text" [48]: {openurl}?query=rft.jtitle%253DCell%26rft.volume%253D173%26rft.spage%253D1370%26rft_id%253Dinfo%253Adoi%252F10.1101%252F221358%26rft_id%253Dinfo%253Apmid%252F29856955%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [49]: /lookup/external-ref?access_num=10.1101/221358&link_type=DOI [50]: /lookup/external-ref?access_num=29856955&link_type=MED&atom=%2Fsci%2F369%2F6510%2F1431.atom [51]: #xref-ref-12-1 "View reference 12 in text" [52]: {openurl}?query=rft.jtitle%253DAging%2BCell%26rft.volume%253D14%26rft.spage%253D352%26rft_id%253Dinfo%253Adoi%252F10.1111%252Facel.12283%26rft_id%253Dinfo%253Apmid%252F25677554%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [53]: /lookup/external-ref?access_num=10.1111/acel.12283&link_type=DOI [54]: /lookup/external-ref?access_num=25677554&link_type=MED&atom=%2Fsci%2F369%2F6510%2F1431.atom


Hierarchical Protein Function Prediction with Tail-GNNs

arXiv.org Machine Learning

Protein function prediction may be framed as predicting subgraphs (with certain closure properties) of a directed acyclic graph describing the hierarchy of protein functions. Graph neural networks (GNNs), with their built-in inductive bias for relational data, are hence naturally suited for this task. However, in contrast with most GNN applications, the graph is not related to the input, but to the label space. Accordingly, we propose Tail-GNNs, neural networks which naturally compose with the output space of any neural network for multi-task prediction, to provide relationally-reinforced labels. For protein function prediction, we combine a Tail-GNN with a dilated convolutional network which learns representations of the protein sequence, making significant improvement in F_1 score and demonstrating the ability of Tail-GNNs to learn useful representations of labels and exploit them in real-world problem solving.


Prediction of Cancer Microarray and DNA Methylation Data using Non-negative Matrix Factorization

arXiv.org Machine Learning

Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets. This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms. This technique gives an accuracy of 98%.