Applications of artificial intelligence (AI) in medicine are creating a bigger impact on global healthcare by bridging the gap between the patient and the provider. It has the potential to improve productivity, efficiency, speed, accuracy and workflow of both the patients and physicians significantly. According to the recent declaration by CB Insights, the role of AI in healthcare is very crucial and hence is being used by almost about 86% healthcare service providers, biotechnology companies, and pharmaceutical companies. Experts have suggested that by the end of the year 2020, on an average, a burden of $54 million will be borne by these industries to ease the life of the common man. This unique convergence of technology and medicine can help in streamlining tedious processes with more patient empowerment and substantial reduction in repetitive tasks. The standard medical practice in future would be like patients visiting a computer first before seeing the doctor. Events of misdiagnosis and medical errors would be a thing of past. Doctors can spend more quality time with patients bringing back the care in healthcare.
As cancer cells spread in a culture dish, Guillaume Jacquemet is watching. The cell movements hold clues to how drugs or gene variants might affect the spread of tumours in the body, and he is tracking the nucleus of each cell in frame after frame of time-lapse microscopy films. But because he has generated about 500 films, each with 120 frames and 200–300 cells per frame, that analysis is challenging to say the least. "If I had to do the tracking manually, it would be impossible," says Jacquemet, a cell biologist at Åbo Akademi University in Turku, Finland. So he has trained a machine to spot the nuclei instead.
Predicting whether a chemical structure shares a desired biological effect can have a significant impact for in-silico compound screening in early drug discovery. In this study, we developed a deep learning model where compound structures are represented as graphs and then linked to their biological footprint. To make this complex problem computationally tractable, compound differences were mapped to biological effect alterations using Siamese Graph Convolutional Neural Networks. The proposed model was able to learn new representations from chemical structures and identify structurally dissimilar compounds that affect similar biological processes with high precision. Additionally, by utilizing deep ensembles to estimate uncertainty, we were able to provide reliable and accurate predictions for chemical structures that are very different from the ones used during training. Finally, we present a novel inference approach, where the trained models are used to estimate the signaling pathways affected by a compound perturbation in a specific cell line, using only its chemical structure as input. As a use case, this approach was used to infer signaling pathways affected by FDA-approved anticancer drugs.
Artificial Intelligence (AI) in healthcare holds great potential to expand access to high-quality medical care, whilst reducing overall systemic costs. Despite hitting the headlines regularly and many publications of proofs-of-concept, certified products are failing to breakthrough to the clinic. AI in healthcare is a multi-party process with deep knowledge required in multiple individual domains. The lack of understanding of the specific challenges in the domain is, therefore, the major contributor to the failure to deliver on the big promises. Thus, we present a decision perspective framework, for the development of AI-driven biomedical products, from conception to market launch. Our framework highlights the risks, objectives and key results which are typically required to proceed through a three-phase process to the market launch of a validated medical AI product. We focus on issues related to Clinical validation, Regulatory affairs, Data strategy and Algorithmic development. The development process we propose for AI in healthcare software strongly diverges from modern consumer software development processes. We highlight the key time points to guide founders, investors and key stakeholders throughout their relevant part of the process. Our framework should be seen as a template for innovation frameworks, which can be used to coordinate team communications and responsibilities towards a reasonable product development roadmap, thus unlocking the potential of AI in medicine.
With the broader and highly successful usage of machine learning in industry and the sciences, there has been a growing demand for explainable AI. Interpretability and explanation methods for gaining a better understanding about the problem solving abilities and strategies of nonlinear Machine Learning such as Deep Learning (DL), LSTMs, and kernel methods are therefore receiving increased attention. In this work we aim to (1) provide a timely overview of this active emerging field and explain its theoretical foundations, (2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations, (3) outline best practice aspects i.e. how to best include interpretation methods into the standard usage of machine learning and (4) demonstrate successful usage of explainable AI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of machine learning.
Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Broadly categorized in three types (i.e., sequences, images, and signals), these data are huge in amount and complex in nature. Mining such an enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities and lately their deep architectures - known as deep learning (DL) - have been successfully applied to solve many complex pattern recognition problems. Highlighting the role of DL in recognizing patterns in biological data, this article provides - applications of DL to biological sequences, images, and signals data; overview of open access sources of these data; description of open source DL tools applicable on these data; and comparison of these tools from qualitative and quantitative perspectives. At the end, it outlines some open research challenges in mining biological data and puts forward a number of possible future perspectives.
While much attention has been given to the problem of estimating the effect of discrete interventions from observational data, relatively little work has been done in the setting of continuous-valued interventions, such as treatments associated with a dosage parameter. In this paper, we tackle this problem by building on a modification of the generative adversarial networks (GANs) framework. Our model, SCIGAN, is flexible and capable of simultaneously estimating counterfactual outcomes for several different continuous interventions. The key idea is to use a significantly modified GAN model to learn to generate counterfactual outcomes, which can then be used to learn an inference model, using standard supervised methods, capable of estimating these counterfactuals for a new sample. To address the challenges presented by shifting to continuous interventions, we propose a novel architecture for our discriminator - we build a hierarchical discriminator that leverages the structure of the continuous intervention setting. Moreover, we provide theoretical results to support our use of the GAN framework and of the hierarchical discriminator. In the experiments section, we introduce a new semi-synthetic data simulation for use in the continuous intervention setting and demonstrate improvements over the existing benchmark models.
THERE ARE MANY REASONS that promising drugs wash out during pharmaceutical development, and one of them is cytochrome P450. A set of enzymes mostly produced in the liver, CYP450, as it is commonly called, is involved in breaking down chemicals and preventing them from building up to dangerous levels in the bloodstream. Many experimental drugs, it turns out, inhibit the production of CYP450--a vexing side effect that can render such a drug toxic in humans. Drug companies have long relied on conventional tools to try to predict whether a drug candidate will inhibit CYP450 in patients, such as by conducting chemical analyses in test tubes, looking at CYP450 interactions with better-understood drugs that have chemical similarities, and running tests on mice. But their predictions are wrong about a third of the time.
Biomedical research papers use significantly different language and jargon when compared to typical English text, which reduces the utility of pre-trained NLP models in this domain. Meanwhile Medline, a database of biomedical abstracts, introduces nearly a million new documents per-year. Applications that could benefit from understanding this wealth of publicly available information, such as scientific writing assistants, chat-bots, or descriptive hypothesis generation systems, require new domain-centered approaches. A conditional language model, one that learns the probability of words given some a priori criteria, is a fundamental building block in many such applications. We propose a transformer-based conditional language model with a shallow encoder "condition" stack, and a deep "language model" stack of multi-headed attention blocks. The condition stack encodes metadata used to alter the output probability distribution of the language model stack. We sample this distribution in order to generate biomedical abstracts given only a proposed title, an intended publication year, and a set of keywords. Using typical natural language generation metrics, we demonstrate that this proposed approach is more capable of producing non-trivial relevant entities within the abstract body than the 1.5B parameter GPT-2 language model.
In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction which can effectively take into account the inherent constraints in the problem. The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints. With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold: it predicts significantly better structures compared to previous SOTA (especially for pseudoknotted structures), while being as efficient as the fastest algorithms in terms of inference time.