Goto

Collaborating Authors

 FDA


Continuous Design Control for Machine Learning in Certified Medical Systems

arXiv.org Artificial Intelligence

Continuous software engineering has become commonplace in numerous fields. However, in regulating intensive sectors, where additional concerns needs to be taken into account, it is often considered difficult to apply continuous development approaches, such as devops. In this paper, we present an approach for using pull requests as design controls, and apply this approach to machine learning in certified medical systems leveraging model cards, a novel technique developed to add explainability to machine learning systems, as a regulatory audit trail. The approach is demonstrated with an industrial system that we have used previously to show how medical systems can be developed in a continuous fashion.


Predicting microsatellite instability and key biomarkers in colorectal cancer from H&E-stained images: Achieving SOTA predictive performance with fewer data using Swin Transformer

arXiv.org Artificial Intelligence

Artificial intelligence (AI) models have been developed to predict clinically relevant biomarkers for colorectal cancer (CRC), including microsatellite instability (MSI). However, existing deep-learning networks are data-hungry and require large training datasets, which are often lacking in the medical domain. In this study, based on the latest Hierarchical Vision Transformer using Shifted Windows (Swin-T), we developed an efficient workflow for biomarkers in CRC (MSI, hypermutation, chromosomal instability, CpG island methylator phenotype, BRAF, and TP53 mutation) that required relatively small datasets, but achieved a state-of-the-art (SOTA) predictive performance. Our Swin-T workflow substantially outperformed published models in an intra-study cross-validation experiment using the TCGA-CRC-DX dataset (N = 462). It also demonstrated excellent generalizability in cross-study external validation and delivered a SOTA AUROC of 0.90 for MSI, using the MCO dataset for training (N = 1065) and the TCGA-CRC-DX for testing. A similar performance (AUROC = 0.91) was achieved by Echle et al., using ~8000 training samples (ResNet18) on the same testing dataset. Swin-T was extremely efficient when using small training datasets and exhibited robust predictive performance with 200-500 training samples. These data indicate that Swin-T could be 5-10 times more efficient than existing algorithms for MSI based on ResNet18 and ShuffleNet. Furthermore, the Swin-T models showed promise as pre-screening tests for MSI status and BRAF mutation status, which could exclude and reduce the samples before subsequent standard testing in a cascading diagnostic workflow, to allow a reduction in turnaround time and costs.


Shape Analysis for Pediatric Upper Body Motor Function Assessment

arXiv.org Artificial Intelligence

Neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), cause progressive muscular degeneration and loss of motor function for 1 in 6,000 children. Traditional upper limb motor function assessments do not quantitatively measure patient-performed motions, which makes it difficult to track progress for incremental changes. Assessing motor function in children with neuromuscular disorders is particularly challenging because they can be nervous or excited during experiments, or simply be too young to follow precise instructions. These challenges translate to confounding factors such as performing different parts of the arm curl slower or faster (phase variability) which affects the assessed motion quality. This paper uses curve registration and shape analysis to temporally align trajectories while simultaneously extracting a mean reference shape. Distances from this mean shape are used to assess the quality of motion. The proposed metric is invariant to confounding factors, such as phase variability, while suggesting several clinically relevant insights. First, there are statistically significant differences between functional scores for the control and patient populations (p$=$0.0213$\le$0.05). Next, several patients in the patient cohort are able to perform motion on par with the healthy cohort and vice versa. Our metric, which is computed based on wearables, is related to the Brooke's score ((p$=$0.00063$\le$0.05)), as well as motor function assessments based on dynamometry ((p$=$0.0006$\le$0.05)). These results show promise towards ubiquitous motion quality assessment in daily life.


AI In Healthcare Highlights & Milestones Summer 2022

#artificialintelligence

This is my new AI in Healthcare Highlights & Milestones Report for Summer 2022. This report includes an overview of advances made during the summer across the healthcare spectrum including important studies, regulatory clearances, fundraising, partnerships, and growth in the AI ecosystem worldwide. This summer scientists demonstrated how they successfully used AI in many areas including: to reduce sepsis deaths, to predict cardiac events, to detect breast cancer, to detect lung cancer, to detects osteoporosis, to detect Parkinson's, to monitor diabetic retinopathy, to detect heart disease, to detect bladder cancer, to enable pathology, to detect fractures, and to monitor Parkinson's using the Apple Watch. In July scientists in Germany published a large scale study demonstrating that radiologists working with AI were more accurate detecting breast cancer than radiologists working without AI, and vice versa - the AI was more accurate when working with a radiologist than when working independently. The study was led by Vara, a German company, in collaboration with radiologists at the Essen University Hospital in Germany and the Memorial Sloan Kettering Cancer Center in New York. Vara's AI is has been used by radiologists in German breast screening centers for two years and is used in 30% of Germany's breast cancer screening centers. Vara's AI software is also used to screen for breast cancer in a hospital in Mexico and in a hospital in Greece.


Dyad Medical Secures FDA Clearance For Echo:Prio Cardiac Imaging Analysis Platform

#artificialintelligence

Dyad Medical, Inc., the developer of the cloud-based AI technology for cardiac image analysis, announced that the U.S. Food and Drug Administration (FDA) has cleared its Echocardiogram application called Echo:Prio through the 510(k) pathway. Echo:Prio, part of the complete cardiac platform named Libby, offers fast, data-driven image analysis of echocardiogram images. It is an important decision-making support tool for index quantification of cardiac function saving the clinician time in diagnosis and treatment-decision making. Echocardiograms are often the first step in diagnosing and developing a treatment plan for heart disease. The heart is the only organ in constant movement as it pumps blood throughout the body.


Semi-automated Extraction of Literature Data Using Machine Learning Methods

#artificialintelligence

NICEATM, other scientists within the NIEHS Division of the NTP, the DOE's Oak Ridge National Laboratory, and FDA are collaborating to automate the process of identifying high-quality developmental toxicity studies in the published scientific literature. The approach applies natural language processing and machine learning methods to identify specific data elements in the full text of scientific publications using both unsupervised and supervised approaches. Preliminary models were trained using a uterotrophic database (Kleinstreuer et al. 2016) built for the EPA Endocrine Disruptor Screening Program. The models leveraged natural language processing and multivariate machine learning models to identify papers that meet minimum criteria to be considered guideline-like studies (Herrmannova et al. 2018). Supervised and unsupervised approaches were developed to automatically extract text features that correspond to study descriptors and classify papers based on their adherence to minimum criteria derived from regulatory guideline studies.


Review of the AMLAS Methodology for Application in Healthcare

arXiv.org Artificial Intelligence

In recent years, the number of machine learning (ML) technologies gaining regulatory approval for healthcare has increased significantly allowing them to be placed on the market. However, the regulatory frameworks applied to them were originally devised for traditional software, which has largely rule-based behaviour, compared to the data-driven and learnt behaviour of ML. As the frameworks are in the process of reformation, there is a need to proactively assure the safety of ML to prevent patient safety being compromised. The Assurance of Machine Learning for use in Autonomous Systems (AMLAS) methodology was developed by the Assuring Autonomy International Programme based on well-established concepts in system safety. This review has appraised the methodology by consulting ML manufacturers to understand if it converges or diverges from their current safety assurance practices, whether there are gaps and limitations in its structure and if it is fit for purpose when applied to the healthcare domain. Through this work we offer the view that there is clear utility for AMLAS as a safety assurance methodology when applied to healthcare machine learning technologies, although development of healthcare specific supplementary guidance would benefit those implementing the methodology.


Review finds 'paucity of robust evidence' on impact of AI clinical outcomes

#artificialintelligence

AI-assisted tools have begun to make a mark on healthcare, with a 2020 study finding 64 FDA-approved devices and algorithms based on artificial intelligence and machine learning. Yet, the systematic review found a lack of evidence to support the technologies. "Despite the plethora of claims for the benefits of AI in enhancing clinical outcomes, there is a paucity of robust evidence. In this systematic review, we identified only a handful of RCTs comparing AI-assisted tools with standard-of-care management in various medical conditions," the authors wrote. Many of the 39 studies had limitations that affect the generalizability of their results.


Tailoring Molecules for Protein Pockets: a Transformer-based Generative Solution for Structured-based Drug Design

arXiv.org Artificial Intelligence

Structure-based drug design is drawing growing attentions in computer-aided drug discovery. Compared with the virtual screening approach where a pre-defined library of compounds are computationally screened, de novo drug design based on the structure of a target protein can provide novel drug candidates. In this paper, we present a generative solution named TamGent (Target-aware molecule generator with Transformer) that can directly generate candidate drugs from scratch for a given target, overcoming the limits imposed by existing compound libraries. Following the Transformer framework (a state-of-the-art framework in deep learning), we design a variant of Transformer encoder to process 3D geometric information of targets and pre-train the Transformer decoder on 10 million compounds from PubChem for candidate drug generation. Systematical evaluation on candidate compounds generated for targets from DrugBank shows that both binding affinity and drugability are largely improved. TamGent outperforms previous baselines in terms of both effectiveness and efficiency. The method is further verified by generating candidate compounds for the SARS-CoV-2 main protease and the oncogenic mutant KRAS G12C. The results show that our method not only re-discovers previously verified drug molecules , but also generates novel molecules with better docking scores, expanding the compound pool and potentially leading to the discovery of novel drugs.


Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain

arXiv.org Artificial Intelligence

Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening, to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG is incapable of recognizing when a prediction model produces worse than random results. Nevertheless, NDCG is still used in the performance evaluation of compound screening using learning-to-rank. This study used the GBDT model with ranking loss functions, called lambdarank and lambdaloss, for ligand-based virtual screening; results were compared with existing RankSVM methods and GBDT models using regression. We also proposed a new ranking metric, Normalized Enrichment Discounted Cumulative Gain (NEDCG), which aims to properly evaluate the goodness of ranking predictions. Results showed that the GBDT model with learning-to-rank outperformed existing regression methods using GBDT and RankSVM on diverse datasets. Moreover, NEDCG showed that predictions by regression were comparable to random predictions in multi-assay, multi-family datasets, demonstrating its usefulness for a more direct assessment of compound screening performance.