Pharmaceuticals & Biotechnology
Biotech firm aims to create 'ChatGPT of biology' – will it work?
A British biotech firm called Basecamp Research has spent the past few years collecting troves of genetic data from microbes living in extreme environments around the world, identifying more than a million species and nearly 10 billion genes new to science. It claims that this massive database of the planet's biodiversity will help train a "ChatGPT of biology" that will answer questions about life on Earth – but there's no guarantee this will work. A hydrogen fuel revolution is coming – here's why we might not want it Jörg Overmann at the Leibniz Institute DSMZ in Germany, which houses one of the world's most diverse collections of microbial cultures, says increasing known genetic sequences is valuable, but may not result in useful findings for things like drug discovery or chemistry without more information about the organisms from which they were collected. "I'm not convinced that in the end the understanding of really novel functions will be accelerated by this brute-force increase in the sequence space," he says. Recent years have seen researchers develop a number of machine learning models trained to identify patterns and predict relationships amid vast amounts of biological data.
Futurist who predicted the iPhone reveals date humans will cheat death
A leading futurist who accurately predicted the rise of the iPhone has now set the date for humanity's most phenomenal breakthrough yet, the ability to cheat death. Ray Kurzweil, a former Google engineering director, has long been known for his bold predictions about the future of technology and humanity. His forecasts often focus on the convergence of biotech, AI, and nanotechnology to radically extend human capabilities. Now, Kurzweil claims humanity is just four years away from its most transformative leap yet, achieving'longevity escape velocity' by 2029. While some experts remain skeptical, Kurzweil's influence in Silicon Valley ensures his predictions continue to shape the broader conversation around life extension and the future of human health.
FDA wants to use AI to speed up drug approval process
The Food and Drug Administration (FDA) is looking to AI to solve the problem of lengthy approval processes, as the Trump administration invests in even more automation amid thousands of federal worker layoffs. The administration wants to "radically increase efficiency" using the burgeoning technology, according to a new article published in the Journal of the American Medical Association (JAMA) outlining the agency's priorities. The department's plan includes using artificial intelligence to examine device and drug applications, which would reportedly shave years off of the approval process, as well as AI computational modeling to reduce animal testing. The plan also proposes requiring just one major patient study to facilitate approvals, part of an overhaul of "legacy" processes. The article cites the success of COVID-19's Operation Warp Speed as precedent for diminished release timelines, but many professionals remain skeptical.
4 ways your organization can adapt and thrive in the age of AI
The evidence suggests almost all business leaders are piloting or investing in AI initiatives, and biopharmaceutical giant Boehringer Ingelheim is committed to investing in emerging technology that could have life-altering consequences. The company's 55,000 employees focus on developing innovative therapies that can improve lives in areas of high unmet medical need, with AI and data playing an increasingly crucial role in their work. Global CIO Markus Schümmelfeder told ZDNET that emerging technology can open all kinds of possibilities when its adoption is accompanied by organizational change: "AI together with big data availability and access to the right capability is the real game-changer." So, how can business leaders drive successful organizational change in an age of AI? Schümmelfeder and his colleague Oliver Sluke, head of IT research, development, and medicine at Boehringer, told ZDNET their four best-practice tips for AI-enabled business transformation. Most digital leaders agree: before you start tinkering with technology, you must ensure your data is managed, sorted, and accessible.
Researchers genetically altered fruit flies to crave cocaine
Breakthroughs, discoveries, and DIY tips sent every weekday. In a world first, scientists at the University of Utah have engineered fruit flies susceptible to cocaine addiction. But as strange as it sounds, there are potentially life-saving reasons for genetically altering the insects to crave the drug. The novel biological model could help addiction treatment therapies development and expedite research timelines. The findings are detailed in the Journal of Neuroscience.
Identifiable Shared Component Analysis of Unpaired Multimodal Mixtures
A core task in multi-modal learning is to integrate information from multiple feature spaces (e.g., text and audio), offering modality-invariant essential representations of data. Recent research showed that, classical tools such as canonical correlation analysis (CCA) provably identify the shared components up to minor ambiguities, when samples in each modality are generated from a linear mixture of shared and private components. Such identifiability results were obtained under the condition that the cross-modality samples are aligned/paired according to their shared information. This work takes a step further, investigating shared component identifiability from multi-modal linear mixtures where cross-modality samples are unaligned. A distribution divergence minimization-based loss is proposed, under which a suite of sufficient conditions ensuring identifiability of the shared components are derived. Our conditions are based on cross-modality distribution discrepancy characterization and density-preserving transform removal, which are much milder than existing studies relying on independent component analysis. More relaxed conditions are also provided via adding reasonable structural constraints, motivated by available side information in various applications. The identifiability claims are thoroughly validated using synthetic and real-world data.
Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks 2, Eric J. Ma
Geometric deep learning has broad applications in biology, a domain where relational structure in data is often intrinsic to modelling the underlying phenomena. Currently, efforts in both geometric deep learning and, more broadly, deep learning applied to biomolecular tasks have been hampered by a scarcity of appropriate datasets accessible to domain specialists and machine learning researchers alike. To address this, we introduce Graphein as a turn-key tool for transforming raw data from widely-used bioinformatics databases into machine learning-ready datasets in a high-throughput and flexible manner. Graphein is a Python library for constructing graph and surface-mesh representations of biomolecular structures, such as proteins, nucleic acids and small molecules, and biological interaction networks for computational analysis and machine learning. Graphein provides utilities for data retrieval from widely-used bioinformatics databases for structural data, including the Protein Data Bank, the AlphaFold Structure Database, chemical data from ZINC and ChEMBL, and for biomolecular interaction networks from STRINGdb, BioGrid, TRRUST and RegNetwork. The library interfaces with popular geometric deep learning libraries: DGL, Jraph, PyTorch Geometric and PyTorch3D though remains framework agnostic as it is built on top of the PyData ecosystem to enable inter-operability with scientific computing tools and libraries. Graphein is designed to be highly flexible, allowing the user to specify each step of the data preparation, scalable to facilitate working with large protein complexes and interaction graphs, and contains useful pre-processing tools for preparing experimental files.
Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search
Computer-aided synthesis planning (CASP) algorithms have demonstrated expertlevel abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of synthesis planning with starting material constraints. Under this formulation, we propose Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability. The search algorithm is guided by a goal-conditioned cost network learned offline from a partially observed hypergraph of valid chemical reactions. We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. DESP can make use of existing one-step retrosynthesis models, and we anticipate its performance to scale as these one-step model capabilities improve.
Recurrent Kernel Networks
Substring kernels are classical tools for representing biological sequences or text. However, when large amounts of annotated data are available, models that allow end-to-end training such as neural networks are often preferred. Links between recurrent neural networks (RNNs) and substring kernels have recently been drawn, by formally showing that RNNs with specific activation functions were points in a reproducing kernel Hilbert space (RKHS). In this paper, we revisit this link by generalizing convolutional kernel networks--originally related to a relaxation of the mismatch kernel--to model gaps in sequences. It results in a new type of recurrent neural network which can be trained end-to-end with backpropagation, or without supervision by using kernel approximation techniques. We experimentally show that our approach is well suited to biological sequences, where it outperforms existing methods for protein classification tasks.
TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search
Molecular geometry prediction of flexible molecules, or conformer search, is a longstanding challenge in computational chemistry. This task is of great importance for predicting structure-activity relationships for a wide variety of substances ranging from biomolecules to ubiquitous materials. Substantial computational resources are invested in Monte Carlo and Molecular Dynamics methods to generate diverse and representative conformer sets for medium to large molecules, which are yet intractable to chemoinformatic conformer search methods. We present TorsionNet, an efficient sequential conformer search technique based on reinforcement learning under the rigid rotor approximation. The model is trained via curriculum learning, whose theoretical benefit is explored in detail, to maximize a novel metric grounded in thermodynamics called the Gibbs Score. Our experimental results show that TorsionNet outperforms the highest scoring chemoinformatics method by 4x on large branched alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin, with applications in renewable energy. TorsionNet also outperforms the far more exhaustive but computationally intensive Self-Guided Molecular Dynamics sampling method.