pre-processing parameter
Pre-processing matters: A segment search method for WSI classification
Wang, Jun, Cui, Yufei, Mao, Yu, Guan, Nan, Xue, Chun Jason
Pre-processing for whole slide images can affect classification performance both in the training and inference stages. Our study analyzes the impact of pre-processing parameters on inference and training across single- and multiple-domain datasets. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose a novel Similarity-based Simulated Annealing approach for fast parameter tuning to enhance inference performance on single-domain data. Our method demonstrates significant performance improvements in accuracy, which raise accuracy from 0.512 to 0.847 in a single domain. We further extend our insight into training performance in multi-domain data by employing a novel Bayesian optimization to search optimal pre-processing parameters, resulting in a high AUC of 0.967. We highlight that better pre-processing for WSI can contribute to further accuracy improvement in the histology area.
Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows
Toussaint, Wiebke, Ding, Aaron Yi, Kawsar, Fahim, Mathur, Akhil
Billions of distributed, heterogeneous and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast and offline inference on personal data. On-device ML is highly context dependent, and sensitive to user, usage, hardware and environment attributes. This sensitivity and the propensity towards bias in ML makes it important to study bias in on-device settings. Our study is one of the first investigations of bias in this emerging domain, and lays important foundations for building fairer on-device ML. We apply a software engineering lens, investigating the propagation of bias through design choices in on-device ML workflows. We first identify reliability bias as a source of unfairness and propose a measure to quantify it. We then conduct empirical experiments for a keyword spotting task to show how complex and interacting technical design choices amplify and propagate reliability bias. Our results validate that design choices made during model training, like the sample rate and input feature type, and choices made to optimize models, like light-weight architectures, the pruning learning rate and pruning sparsity, can result in disparate predictive performance across male and female groups. Based on our findings we suggest low effort strategies for engineers to mitigate bias in on-device ML.
Characterising the Role of Pre-Processing Parameters in Audio-based Embedded Machine Learning
When deploying machine learning (ML) models on embedded and IoT devices, performance encompasses more than an accuracy metric: inference latency, energy consumption, and model fairness are necessary to ensure reliable performance under heterogeneous and resource-constrained operating conditions. To this end, prior research has studied model-centric approaches, such as tuning the hyperparameters of the model during training and later applying model compression techniques to tailor the model to the resource needs of an embedded device. In this paper, we take a data-centric view of embedded ML and study the role that pre-processing parameters in the data pipeline can play in balancing the various performance metrics of an embedded ML system. Through an in-depth case study with audio-based keyword spotting (KWS) models, we show that pre-processing parameter tuning is a remarkable tool that model developers can adopt to trade-off between a model's accuracy, fairness, and system efficiency, as well as to make an embedded ML model resilient to unseen deployment conditions.
Integrated Pre-Processing for Bayesian Nonlinear System Identification with Gaussian Processes
Frigola, Roger, Rasmussen, Carl Edward
We introduce GP-FNARX: a new model for nonlinear system identification based on a nonlinear autoregressive exogenous model (NARX) with filtered regressors (F) where the nonlinear regression problem is tackled using sparse Gaussian processes (GP). We integrate data pre-processing with system identification into a fully automated procedure that goes from raw data to an identified model. Both pre-processing parameters and GP hyper-parameters are tuned by maximizing the marginal likelihood of the probabilistic model. We obtain a Bayesian model of the system's dynamics which is able to report its uncertainty in regions where the data is scarce. The automated approach, the modeling of uncertainty and its relatively low computational cost make of GP-FNARX a good candidate for applications in robotics and adaptive control.