Variation in prediction accuracy due to randomness in data division and fair evaluation using interval estimation
–arXiv.org Artificial Intelligence
These studies have been accelerated by 1) the increasing sophistication of information and communication technology, 2) large-scale data obtained through longitudinal studies, etc., and 3) the opening of program codes for building predictive models using machine learning. In particular, these studies have become even more active in recent years with the advent of automated machine learning framework [4-6]. As an example, published studies have applied MLA to data from the UK Biobank large longitudinal cohort study to develop models to diagnose and predict disease onset in advance [4, 7]. Such studies have been conducted previously, and in 1988, J. W. Smith et al. applied neural networks to data collected by the National Institute of Diabetes and Digestive and Kidney Diseases from a population of Pima Indians near Phoenix, Arizona, to predict the onset of diabetes [8-11]. This dataset, called the PID dataset, is still the primary dataset used to evaluate MLA in recent years, and in 2014, a method was proposed to combine multiple prediction models to predict onset of the disease, showing a very high prediction accuracy of 0.97 [12-17]. As mentioned above, a great deal of research has been published in recent years on predictive models of disease using machine learning. However, there are issues such as inadequate reporting of prediction models and lack of external validation [18].
arXiv.org Artificial Intelligence
Sep-2-2024
- Country:
- North America > United States
- Minnesota > Olmsted County
- Rochester (0.04)
- Arizona > Maricopa County
- Phoenix (0.24)
- Minnesota > Olmsted County
- Asia > Japan
- Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
- North America > United States
- Genre:
- Research Report > Experimental Study (0.94)
- Industry:
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Technology: