Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data
Bellamy, David R., Kumar, Bhawesh, Wang, Cindy, Beam, Andrew
–arXiv.org Artificial Intelligence
Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations. In recent years, self-supervised pre-training of masked language models (MLMs) (see Appendix A for background) has demonstrated remarkable success across a wide range of machine learning problems and has led to significant downstream improvements across diverse tasks in natural language processing (Liu et al., 2019; Devlin et al., 2019; Raffel et al., 2020). There is considerable excitement surrounding the potential of large pre-trained MLMs to achieve similar success in medical applications. For instance, existing applications of MLMs in medicine have already yielded promising results in tasks related to medical text understanding (Lee et al., 2020; Alsentzer et al., 2019; Huang et al., 2019; Yang et al., 2019; Beltagy et al., 2019). Laboratory data is abundant, routinely collected, less biased compared to other types of data in electronic health records (EHRs) like billing codes (Beam et al., 2021), and directly measure a patient's physiological state, offering a valuable opportunity for creating a medical foundation model. However, there is a large body of evidence showing that deep learning is consistently outperformed on so-called "tabular" data prediction tasks by traditional machine learning techniques like random forests, XGBoost, and even simple regression models (Bellamy et al., 2020; Finlayson et al., 2023; Sharma, 2013). The reasons for this are only partially understood, but previous work (Grinsztajn et al., 2022) has suggested that this phenomenon may be caused by a rotational invariance in deep learning models that is harmful for tabular data. More broadly, the success of deep learning is thought to be largely due to inductive biases that can be leveraged for images, text, and graphs. These inductive biases are absent or only weakly present in tabular data. Conversely, tree-based methods are scale invariant and robust to uninformative features. We evaluated both models on several downstream outcome prediction tasks and validated the success of pre-training with a set of intrinsic evaluations.
arXiv.org Artificial Intelligence
Dec-9-2023
- Country:
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine
- Diagnostic Medicine (1.00)
- Health Care Providers & Services (0.67)
- Health Care Technology > Medical Record (0.86)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Immunology (1.00)
- Infections and Infectious Diseases (0.70)
- Oncology (0.93)
- Health & Medicine
- Technology: