Haitjema, Saskia
Technical Insights and Legal Considerations for Advancing Federated Learning in Bioinformatics
Malpetti, Daniele, Scutari, Marco, Gualdi, Francesco, van Setten, Jessica, van der Laan, Sander, Haitjema, Saskia, Lee, Aaron Mark, Hering, Isabelle, Mangili, Francesca
Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads to a faster and more robust exploration and translation of results. More widespread use of federated learning may have the same impact in bioinformatics, allowing access to many combinations of genotypic, phenotypic and environmental information that are undercovered or not included in existing biobanks. This paper reviews the methodological, infrastructural and legal issues that academic and clinical institutions must address before implementing it. Finally, we provide recommendations for the reliable use of federated learning and its effective translation into clinical practice.
Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods
van Es, Bram, Reteig, Leon C., Tan, Sander C., Schraagen, Marijn, Hemker, Myrthe M., Arends, Sebastiaan R. S., Rios, Miguel A. R., Haitjema, Saskia
As structured data are often insufficient, labels need to be extracted from free text in electronic health records when developing models for clinical information retrieval and decision support systems. One of the most important contextual properties in clinical text is negation, which indicates the absence of findings. We aimed to improve large scale extraction of labels by comparing three methods for negation detection in Dutch clinical notes. We used the Erasmus Medical Center Dutch Clinical Corpus to compare a rule-based method based on ContextD, a biLSTM model using MedCAT and (finetuned) RoBERTa-based models. We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall. In addition, we systematically categorized the classification errors for each model, which can be used to further improve model performance in particular applications. Combining the three models naively was not beneficial in terms of performance. We conclude that the biLSTM and RoBERTa-based models in particular are highly accurate accurate in detecting clinical negations, but that ultimately all three approaches can be viable depending on the use case at hand.