Accuracy
Bootstrapping Statistical Inference for Off-Policy Evaluation
Hao, Botao, Ji, Xiang, Duan, Yaqi, Lu, Hao, Szepesvári, Csaba, Wang, Mengdi
Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood. In this paper, we study the use of bootstrapping in off-policy evaluation (OPE), and in particular, we focus on the fitted Q-evaluation (FQE) that is known to be minimax-optimal in the tabular and linear-model cases. We propose a bootstrapping FQE method for inferring the distribution of the policy evaluation error and show that this method is asymptotically efficient and distributionally consistent for off-policy statistical inference. To overcome the computation limit of bootstrapping, we further adapt a subsampling procedure that improves the runtime by an order of magnitude. We numerically evaluate the bootrapping method in classical RL environments for confidence interval estimation, estimating the variance of off-policy evaluator, and estimating the correlation between multiple off-policy evaluators.
Baseline Pruning-Based Approach to Trojan Detection in Neural Networks
Bajcsy, Peter, Majurski, Michael
This paper addresses the problem of detecting trojans in neural networks (NNs) by analyzing systematically pruned NN models. Our pruning-based approach consists of three main steps. First, detect any deviations from the reference look-up tables of model file sizes and model graphs. Next, measure the accuracy of a set of systematically pruned NN models following multiple pruning schemas. Finally, classify a NN model as clean or poisoned by applying a mapping between accuracy measurements and NN model labels. This work outlines a theoretical and experimental framework for finding the optimal mapping over a large search space of pruning parameters. Based on our experiments using Round 1 and Round 2 TrojAI Challenge datasets, the approach achieves average classification accuracy of 69.73 % and 82.41% respectively with an average processing time of less than 60 s per model. For both datasets random guessing would produce 50% classification accuracy. Reference model graphs and source code are available from GitHub.
Analysis of the Effectiveness of Face-Coverings on the Death Rate of COVID-19 Using Machine Learning
Lafzi, Ali, Boodaghi, Miad, Zamani, Siavash, Mohammadshafie, Niyousha
The recent outbreak of the COVID-19 shocked humanity leading to the death of millions of people worldwide. To stave off the spread of the virus, the authorities in the US, employed different strategies including the mask mandate (MM) order issued by the states' governors. Although most of the previous studies pointed in the direction that MM can be effective in hindering the spread of viral infections, the effectiveness of MM in reducing the degree of exposure to the virus and, consequently, death rates remains indeterminate. Indeed, the extent to which the degree of exposure to COVID-19 takes part in the lethality of the virus remains unclear. In the current work, we defined a parameter called the average death ratio as the monthly average of the ratio of the number of daily deaths to the total number of daily cases. We utilized survey data provided by New York Times to quantify people's abidance to the MM order. Additionally, we implicitly addressed the extent to which people abide by the MM order that may depend on some parameters like population, income, and political inclination. Using different machine learning classification algorithms we investigated how the decrease or increase in death ratio for the counties in the US West Coast correlates with the input parameters. Our results showed a promising score as high as 0.94 with algorithms like XGBoost, Random Forest, and Naive Bayes. To verify the model, the best performing algorithms were then utilized to analyze other states (Arizona, New Jersey, New York and Texas) as test cases. The findings show an acceptable trend, further confirming usability of the chosen features for prediction of similar cases.
The Limits of Computation in Solving Equity Trade-Offs in Machine Learning and Justice System Risk Assessment
This paper explores how different ideas of racial equity in machine learning, in justice settings in particular, can present trade-offs that are difficult to solve computationally. Machine learning is often used in justice settings to create risk assessments, which are used to determine interventions, resources, and punitive actions. Overall aspects and performance of these machine learning-based tools, such as distributions of scores, outcome rates by levels, and the frequency of false positives and true positives, can be problematic when examined by racial group. Models that produce different distributions of scores or produce a different relationship between level and outcome are problematic when those scores and levels are directly linked to the restriction of individual liberty and to the broader context of racial inequity. While computation can help highlight these aspects, data and computation are unlikely to solve them. This paper explores where values and mission might have to fill the spaces computation leaves.
Model Rectification via Unknown Unknowns Extraction from Deployment Samples
Abrahao, Bruno, Wang, Zheng, Ahmed, Haider, Zhu, Yuchen
Model deficiency that results from incomplete training data is a form of structural blindness that leads to costly errors, oftentimes with high confidence. During the training of classification tasks, underrepresented class-conditional distributions that a given hypothesis space can recognize results in a mismatch between the model and the target space. To mitigate the consequences of this discrepancy, we propose Random Test Sampling and Cross-Validation (RTSCV) as a general algorithmic framework that aims to perform a post-training model rectification at deployment time in a supervised way. RTSCV extracts unknown unknowns (u.u.s), i.e., examples from the class-conditional distributions that a classifier is oblivious to, and works in combination with a diverse family of modern prediction models. RTSCV augments the training set with a sample of the test set (or deployment data) and uses this redefined class layout to discover u.u.s via cross-validation, without relying on active learning or budgeted queries to an oracle. We contribute a theoretical analysis that establishes performance guarantees based on the design bases of modern classifiers. Our experimental evaluation demonstrates RTSCV's effectiveness, using 7 benchmark tabular and computer vision datasets, by reducing a performance gap as large as 41% from the respective pre-rectification models. Last we show that RTSCV consistently outperforms state-of-the-art approaches.
An Update of a Progressively Expanded Database for Automated Lung Sound Analysis
Hsu, Fu-Shun, Huang, Shang-Ran, Huang, Chien-Wen, Cheng, Yuan-Ren, Chen, Chun-Chieh, Hsiao, Jack, Chen, Chung-Wei, Lai, Feipei
A continuous real-time respiratory sound automated analysis system is needed in clinical practice. Previously, we established an open access lung sound database, HF_Lung_V1, and automated lung sound analysis algorithms capable of detecting inhalation, exhalation, continuous adventitious sounds (CASs) and discontinuous adventitious sounds (DASs). In this study, HF-Lung-V1 has been further expanded to HF-Lung-V2 with 1.45 times of increase in audio files. The convolutional neural network (CNN)-bidirectional gated recurrent unit (BiGRU) model was separately trained with training datasets of HF_Lung_V1 (V1_Train) and HF_Lung_V2 (V2_Train), and then were used for the performance comparisons of segment detection and event detection on both test datasets of HF_Lung_V1 (V1_Test) and HF_Lung_V2 (V2_Test). The performance of segment detection was measured by accuracy, predictive positive value (PPV), sensitivity, specificity, F1 score, receiver operating characteristic (ROC) curve and area under the curve (AUC), whereas that of event detection was evaluated with PPV, sensitivity, and F1 score. Results indicate that the model performance trained by V2_Train showed improvement on both V1_Test and V2_Test in inhalation, CASs and DASs, particularly in CASs, as well as on V1_Test in exhalation.
Mitigating belief projection in explainable artificial intelligence via Bayesian Teaching
Yang, Scott Cheng-Hsin, Vong, Wai Keen, Sojitra, Ravi B., Folke, Tomas, Shafto, Patrick
State-of-the-art deep-learning systems use decision rules that are challenging for humans to model. Explainable AI (XAI) attempts to improve human understanding but rarely accounts for how people typically reason about unfamiliar agents. We propose explicitly modeling the human explainee via Bayesian Teaching, which evaluates explanations by how much they shift explainees' inferences toward a desired goal. We assess Bayesian Teaching in a binary image classification task across a variety of contexts. Absent intervention, participants predict that the AI's classifications will match their own, but explanations generated by Bayesian Teaching improve their ability to predict the AI's judgements by moving them away from this prior belief. Bayesian Teaching further allows each case to be broken down into sub-examples (here saliency maps). These sub-examples complement whole examples by improving error detection for familiar categories, whereas whole examples help predict correct AI judgements of unfamiliar cases.
A Practical Model-based Segmentation Approach for Accurate Activation Detection in Single-Subject functional Magnetic Resonance Imaging Studies
Chen, Wei-Chen, Maitra, Ranjan
Functional Magnetic Resonance Imaging (fMRI) maps cerebral activation in response to stimuli but this activation is often difficult to detect, especially in low-signal contexts and single-subject studies. Accurate activation detection can be guided by the fact that very few voxels are, in reality, truly activated and that activated voxels are spatially localized, but it is challenging to incorporate both these facts. We provide a computationally feasible and methodologically sound model-based approach, implemented in the R package MixfMRI, that bounds the a priori expected proportion of activated voxels while also incorporating spatial context. Results on simulation experiments for different levels of activation detection difficulty are uniformly encouraging. The value of the methodology in low-signal and single-subject fMRI studies is illustrated on a sports imagination experiment. Concurrently, we also extend the potential use of fMRI as a clinical tool to, for example, detect awareness and improve treatment in individual patients in persistent vegetative state, such as traumatic brain injury survivors.
BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices
Fasfous, Nael, Vemparala, Manoj-Rohit, Frickenstein, Alexander, Frickenstein, Lukas, Stechele, Walter
Face masks have long been used in many areas of everyday life to protect against the inhalation of hazardous fumes and particles. They also offer an effective solution in healthcare for bi-directional protection against air-borne diseases. Wearing and positioning the mask correctly is essential for its function. Convolutional neural networks (CNNs) offer an excellent solution for face recognition and classification of correct mask wearing and positioning. In the context of the ongoing COVID-19 pandemic, such algorithms can be used at entrances to corporate buildings, airports, shopping areas, and other indoor locations, to mitigate the spread of the virus. These application scenarios impose major challenges to the underlying compute platform. The inference hardware must be cheap, small and energy efficient, while providing sufficient memory and compute power to execute accurate CNNs at a reasonably low latency. To maintain data privacy of the public, all processing must remain on the edge-device, without any communication with cloud servers. To address these challenges, we present a low-power binary neural network classifier for correct facial-mask wear and positioning. The classification task is implemented on an embedded FPGA, performing high-throughput binary operations. Classification can take place at up to ~6400 frames-per-second, easily enabling multi-camera, speed-gate settings or statistics collection in crowd settings. When deployed on a single entrance or gate, the idle power consumption is reduced to 1.6W, improving the battery-life of the device. We achieve an accuracy of up to 98% for four wearing positions of the MaskedFace-Net dataset. To maintain equivalent classification accuracy for all face structures, skin-tones, hair types, and mask types, the algorithms are tested for their ability to generalize the relevant features over all subjects using the Grad-CAM approach.
Removing biased data to improve fairness and accuracy
Verma, Sahil, Ernst, Michael, Just, Rene
Machine learning systems are often trained using data collected from historical decisions. If past decisions were biased, then automated systems that learn from historical data will also be biased. We propose a black-box approach to identify and remove biased training data. Machine learning models trained on such debiased data (a subset of the original training data) have low individual discrimination, often 0%. These models also have greater accuracy and lower statistical disparity than models trained on the full historical data. We evaluated our methodology in experiments using 6 real-world datasets. Our approach outperformed seven previous approaches in terms of individual discrimination and accuracy.