Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels

Song, Le, Segal, Eran, Xing, Eric

arXiv.org Artificial Intelligence 

Biology lies at the core of vital fields such as medicine, pharmacy, public health, longevity, agriculture and food security, environmental protection, and clean energy. The mechanisms underlying living and physical systems have always fascinated us. With Newton's laws, we can predict the orbits of celestial bodies; the periodic table allows us to anticipate the properties of chemical compounds; and we can even simulate weather and environmental systems. However, despite our extensive knowledge of atomic, molecular, chemical, and physical laws, and the computational power of modern computers, we still cannot simulate biological systems accurately. Whether we aim to pinpoint genetic markers of diseases for diagnosis, design drugs to heal damaged cells or deter pathogens, or develop vaccines to combat pandemics, such advancements in medicine consistently require a profound understanding of the underlying biology at all levels, along with the ability to predict, simulate, and program biological activities comprehensively. Manipulating biology in the physical world is extremely complex, expensive, and risky, and should be preceded by extensive computer-aided digital design, simulation, and validation as in other industrial fields such as civil, nuclear, and semiconductor engineering. We propose a vision in which such capabilities can be realized using generative AI. Generative AI and large pretrained models across text, images, speech, and video have become key pillars for advancing artificial general intelligence (AGI), driving significant improvements in a wide range of downstream tasks, including language and image comprehension, translation, knowledge extraction, reasoning, and cross-modal generation. These models are often known as "foundation