Graph Representation Learning in Biomedicine

Li, Michelle M., Huang, Kexin, Zitnik, Marinka

arXiv.org Artificial Intelligence 

Networks (or graphs) are pervasive in biology and medicine, from molecular interaction maps to populationscale social and health interactions. With the multitude of bioentities and associations that can be described by networks, they are prevailing representations of biological organization and biomedical knowledge. For instance, edges in a regulatory network can indicate causal activating and inhibitory relationships between genes [149]; edges between genes and diseases can indicate genes that are'upregulated by', 'downregulated by', or'associated with' a disease [141]; and edges in a knowledge network built from electronic health records (EHR) can indicate co-occurrences of medical codes across patients [81, 156, 161]. The ability to model all biomedical discoveries to date--even overlay patient-specific information--in a unified data representation has driven the development of artificial intelligence, specifically deep learning, for networks. In fact, the diversity and multimodality in networks not only boost performance of predictive models, but importantly enable broad generalization to settings not seen during training [74] and improve model interpretability [31, 140]. Nevertheless, interactions in networks give rise to a bewildering degree of complexity that can likely only be fully understood through a holistic and integrated view [14, 22, 137]. As a result, systems biology and medicine-- upon which deep learning on graphs is founded--have identified over the last two decades organizing principles that govern networks [13, 66, 85, 227]. 1