Reconsideration on evaluation of machine learning models in continuous monitoring using wearables
Ding, Cheng, Guo, Zhicheng, Rudin, Cynthia, Xiao, Ran, Nahab, Fadi B, Hu, Xiao
–arXiv.org Artificial Intelligence
Especially with the utilization of photoplethysmography (PPG) signal, these devices have demonstrated significant potential in providing real-time insights into an individual's health status. PPG, due to its non-invasive nature and ease of integration into wearable technology, has become a cornerstone in modern health monitoring systems [5]. Analyzing wearable device signals often involves ML models of different complexities [6, 7]. In the model development phase, typically, continuous signals are cut into discrete segments, and the model's performance is evaluated at the segment level using conventional metrics such as accuracy, sensitivity, specificity, and F1 score [8]. However, relying solely on these conventional metrics at the segment level does not provide a holistic assessment and hurts both consumers by making it impossible to select optimal solution for their needs and innovators by failing to guide their effort towards true progresses. The complex nature of continuous health monitoring using wearable devices introduces unique challenges beyond conventional evaluation approaches' capabilities, as illustrated in Figure 1. Recognizing these challenges is imperative for imbuing continuous health monitoring applications with accurate and reliable ML models to ensure a successful translation of these models into everyday use by millions of people and fulfill the potential of this technology at scale. In the subsequent sections, we outline the challenges in evaluating ML models for continuous health monitoring using wearables, thoroughly review existing evaluation methods and metrics, and propose a standardized evaluation guideline.
arXiv.org Artificial Intelligence
Dec-4-2023