Nguyen, Mai H.
BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction
McInnes, Bridget T., Tang, Jiawei, Mahendran, Darshini, Nguyen, Mai H.
This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improvements, particularly in CPR groups shared between the datasets. The findings underscore the importance of dataset merging in augmenting sample counts and improving model accuracy. Moreover, the study highlights the potential of automated information extraction in biomedical research and clinical practice.
Prescribed Fire Modeling using Knowledge-Guided Machine Learning for Land Management
Chatterjee, Somya Sharma, Lindsay, Kelly, Chatterjee, Neel, Patil, Rohan, De Callafon, Ilkay Altintas, Steinbach, Michael, Giron, Daniel, Nguyen, Mai H., Kumar, Vipin
In recent years, the increasing threat of devastating wildfires has underscored the need for effective prescribed fire management. Process-based computer simulations have traditionally been employed to plan prescribed fires for wildfire prevention. However, even simplified process models like QUIC-Fire are too compute-intensive to be used for real-time decision-making, especially when weather conditions change rapidly. Traditional ML methods used for fire modeling offer computational speedup but struggle with physically inconsistent predictions, biased predictions due to class imbalance, biased estimates for fire spread metrics (e.g., burned area, rate of spread), and generalizability in out-of-distribution wind conditions. This paper introduces a novel machine learning (ML) framework that enables rapid emulation of prescribed fires while addressing these concerns. By incorporating domain knowledge, the proposed method helps reduce physical inconsistencies in fuel density estimates in data-scarce scenarios. To overcome the majority class bias in predictions, we leverage pre-existing source domain data to augment training data and learn the spread of fire more effectively. Finally, we overcome the problem of biased estimation of fire spread metrics by incorporating a hierarchical modeling structure to capture the interdependence in fuel density and burned area. Notably, improvement in fire metric (e.g., burned area) estimates offered by our framework makes it useful for fire managers, who often rely on these fire metric estimates to make decisions about prescribed burn management. Furthermore, our framework exhibits better generalization capabilities than the other ML-based fire modeling methods across diverse wind conditions and ignition patterns.
Left Ventricle Segmentation and Volume Estimation on Cardiac MRI using Deep Learning
Abdelmaguid, Ehab, Huang, Jolene, Kenchareddy, Sanjay, Singla, Disha, Wilke, Laura, Nguyen, Mai H., Altintas, Ilkay
In the United States, heart disease is the leading cause of death for males and females, accounting for 610,000 deaths each year [1]. Physicians use Magnetic Resonance Imaging (MRI) scans to take images of the heart in order to non-invasively estimate its structural and functional parameters for cardiovascular diagnosis and disease management. The end-systolic volume (ESV) and end-diastolic volume (EDV) of the left ventricle (LV), and the ejection fraction (EF) are indicators of heart disease. These measures can be derived from the segmented contours of the LV; thus, consistent and accurate segmentation of the LV from MRI images are critical to the accuracy of the ESV, EDV, and EF, and to non-invasive cardiac disease detection. In this work, various image preprocessing techniques, model configurations using the U-Net deep learning architecture, postprocessing methods, and approaches for volume estimation are investigated. An end-to-end analytics pipeline with multiple stages is provided for automated LV segmentation and volume estimation. First, image data are reformatted and processed from DICOM and NIfTI formats to raw images in array format. Secondly, raw images are processed with multiple image preprocessing methods and cropped to include only the Region of Interest (ROI). Thirdly, preprocessed images are segmented using U-Net models. Lastly, post processing of segmented images to remove extra contours along with intelligent slice and frame selection are applied, followed by calculation of the ESV, EDV, and EF. This analytics pipeline is implemented and runs on a distributed computing environment with a GPU cluster at the San Diego Supercomputer Center at UCSD.