ELMV: an Ensemble-Learning Approach for Analyzing Electrical Health Records with Significant Missing Values

Liu, Lucas J., Zhang, Hongwei, Di, Jianzhong, Chen, Jin

Nov-3-2020–arXiv.org Machine Learning

Real-world Electronic Health Record (EHR) data have played an important role in improving patient care and clinician experience and providing rich information for biomedical researches [1, 2, 3]. However, many EHR data contain a significant proportion of missing values, which could be as high as 50%, leading to a substantially reduced sample size even in initially large cohorts if we restrict the analysis to individuals with complete data [4, 5]. On the other hand, leaving a big portion of missing information unaddressed usually cause bias, loss of efficiency, and finally leads to inappropriate conclusion to be drawn [6]. Data imputation algorithms (e.g. the scikit-learn estimators [7]) attempt to replace missing data with meaningful values including random values, the mean or median of rows or columns, spatial-temporal regressed values, most frequent values in the same columns, or representative values identified using k-nearest neighbor [8]. Advanced data imputation algorithms, such as Multivariate Imputation by Chained Equation (MICE) [9], have been developed to fill missing values multiple times. Leveraging the power of GPU and big dta, deep neural network models, such as Datawig [10], can estimate more accurate results than traditional data imputation methods [11].

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

Nov-3-2020

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.04)
  - Kentucky > Fayette County
    - Lexington (0.04)
- Asia > China
  - Shanghai > Shanghai (0.05)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.93)
  - Health Care Technology > Medical Record (0.92)
  - Therapeutic Area > Endocrinology
    - Diabetes (1.00)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Nearest Neighbor Methods (0.54)
    - Neural Networks > Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found