Accuracy
Sample selection from a given dataset to validate machine learning models
With the development of automatic diagnostics based on statistical predictive models, coming from any supervised machine learning (ML) algorithms, important issues about model validation have been raised. For example in the industrial nondestructive testing field (e.g. for aeronautic or nuclear industry), generalized automated inspection (that will allow large gain in terms of efficiency and economy) has to provide high guarantees in terms of performance. In this case, it is necessary to be able to select a validation data basis that will not be used for the training nor the selection of the ML model [3, 7]. This validation data basis (also referred as verification data in the literature) has not to be communicated to the ML developers because it will serve to realize an independent evaluation of the provided ML model (applying a cross validation method is then not possible). This validation sample is typically used to provide prediction residuals (which can be finely analyzed), as well as average ML model quality measures (as the mean square error in a regression problem or the misclassification rate in a classification problem). In this paper, we address the particular question about the way to select a "good" validation basis from a dataset useful to specify a ML model. We use indifferently the term "validation" and "test" for the basis (also called sample) because we restrict our problem to the distinction between a learning sample (which includes the ML fitting and selection phases) and a test sample. An important question is the number and the location of these test points.
TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains
Awad, George, Butt, Asad A., Curtis, Keith, Fiscus, Jonathan, Godil, Afzal, Lee, Yooyoung, Delgado, Andrew, Zhang, Jesse, Godard, Eliot, Chocot, Baptiste, Diduch, Lukas, Liu, Jeffrey, Smeaton, Alan F., Graham, Yvette, Jones, Gareth J. F., Kraaij, Wessel, Quenot, Georges
The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last twenty years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2020 represented a continuation of four tasks and the addition of two new tasks. In total, 29 teams from various research organizations worldwide completed one or more of the following six tasks: 1. Ad-hoc Video Search (AVS), 2. Instance Search (INS), 3. Disaster Scene Description and Indexing (DSDI), 4. Video to Text Description (VTT), 5. Activities in Extended Video (ActEV), 6. Video Summarization (VSUM). This paper is an introduction to the evaluation framework, tasks, data, and measures used in the evaluation campaign.
Weakly Supervised Multi-task Learning for Concept-based Explainability
Belém, Catarina, Balayan, Vladimir, Saleiro, Pedro, Bizarro, Pedro
In ML-aided decision-making tasks, such as fraud detection or medical diagnosis, the human-in-the-loop, usually a domain-expert without technical ML knowledge, prefers high-level concept-based explanations instead of low-level explanations based on model features. To obtain faithful concept-based explanations, we leverage multi-task learning to train a neural network that jointly learns to predict a decision task based on the predictions of a precedent explainability task (i.e., multi-label concepts). There are two main challenges to overcome: concept label scarcity and the joint learning. To address both, we propose to: i) use expert rules to generate a large dataset of noisy concept labels, and ii) apply two distinct multi-task learning strategies combining noisy and golden labels. We compare these strategies with a fully supervised approach in a real-world fraud detection application with few golden labels available for the explainability task. With improvements of 9.26% and of 417.8% at the explainability and decision tasks, respectively, our results show it is possible to improve performance at both tasks by combining labels of heterogeneous quality. Figure 1: Weakly supervised multi-task learning strategies for concept-based explainability: (A) baseline strategy resorts exclusively to golden explainability labels; (B) two-stage learning strategy (1) uses noisy explainability labels to pre-train a base model and (2) fine-tuning either using purely golden batches or hybrid ones; (C) hybrid learning strategy only uses hybrid batches of golden and noisy explainability labels. The AI black-box paradigm has led to a growing demand for model explanations (Ribeiro et al., 2016; Lundberg & Lee, 2017). It concerns the generation of high-level concept-based explanations (e.g., "Suspicious payment") rather than low-level explanations based on model features (e.g., "MCC 7801"). Concept-based explainability can be implemented using a multi-task learning approach (Kim et al., 2018; Melis & Jaakkola, 2018; Ghorbani et al., 2019; Koh et al., 2020).
Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance
Miller, Andrew C., Gatys, Leon A., Futoma, Joseph, Fox, Emily B.
Machine learning models $-$ now commonly developed to screen, diagnose, or predict health conditions $-$ are evaluated with a variety of performance metrics. An important first step in assessing the practical utility of a model is to evaluate its average performance over an entire population of interest. In many settings, it is also critical that the model makes good predictions within predefined subpopulations. For instance, showing that a model is fair or equitable requires evaluating the model's performance in different demographic subgroups. However, subpopulation performance metrics are typically computed using only data from that subgroup, resulting in higher variance estimates for smaller groups. We devise a procedure to measure subpopulation performance that can be more sample-efficient than the typical subsample estimates. We propose using an evaluation model $-$ a model that describes the conditional distribution of the predictive model score $-$ to form model-based metric (MBM) estimates. Our procedure incorporates model checking and validation, and we propose a computationally efficient approximation of the traditional nonparametric bootstrap to form confidence intervals. We evaluate MBMs on two main tasks: a semi-synthetic setting where ground truth metrics are available and a real-world hospital readmission prediction task. We find that MBMs consistently produce more accurate and lower variance estimates of model performance for small subpopulations.
Machine Learning Approaches for Inferring Liver Diseases and Detecting Blood Donors from Medical Diagnosis
Mostafa, Fahad B., Hasan, Md Easin
For a medical diagnosis, health professionals use different kinds of pathological ways to make a decision for medical reports in terms of patients medical condition. In the modern era, because of the advantage of computers and technologies, one can collect data and visualize many hidden outcomes from them. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning data driven algorithms can be used to validate existing methods and help researchers to suggest potential new decisions. In this paper, multiple imputation by chained equations was applied to deal with missing data, and Principal Component Analysis to reduce the dimensionality. To reveal significant findings, data visualizations were implemented. We presented and compared many binary classifier machine learning algorithms (Artificial Neural Network, Random Forest, Support Vector Machine) which were used to classify blood donors and non-blood donors with hepatitis, fibrosis and cirrhosis diseases. From the data published in UCI-MLR [1], all mentioned techniques were applied to find one better method to classify blood donors and non-blood donors (hepatitis, fibrosis, and cirrhosis) that can help health professionals in a laboratory to make better decisions. Our proposed ML-method showed better accuracy score (e.g. 98.23% for SVM). Thus, it improved the quality of classification.
Customizable Reference Runtime Monitoring of Neural Networks using Resolution Boxes
Wu, Changshun, Falcone, Yliès, Bensalem, Saddek
We present an approach for the runtime verification of classification systems via data abstraction. Data abstraction relies on the notion of box with a resolution. Boxbased abstraction consists in representing a set of values by its minimal and maximal values in each dimension. We augment boxes with a notion of resolution; this allows to define the notion of clustering coverage, which is intuitively a quantitative metric over boxes that indicates the quality of the abstraction. This allows studying the effect of different clustering parameters on the constructed boxes and estimating an interval of sub-optimal parameters. Moreover, we show how to automatically construct monitors that make use of both the correct and incorrect behaviors of a classification system. This allows checking the size of the monitor abstractions and analysing the separability of the network. Monitors are obtained by combining the sub-monitors of each class of the system placed at some selected layers. Our experiments demonstrate the effectiveness of our clustering coverage estimation and show how to assess the effectiveness and precision of monitors according to the selected clustering parameter and the chosen monitored layers.
Quantum Causal Inference in the Presence of Hidden Common Causes: an Entropic Approach
Javidian, Mohammad Ali, Aggarwal, Vaneet, Jacob, Zubin
Quantum causality is an emerging field of study which has the potential to greatly advance our understanding of quantum systems. One of the most important problems in quantum causality is linked to this prominent aphorism that states correlation does not mean causation. A direct generalization of the existing causal inference techniques to the quantum domain is not possible due to superposition and entanglement. We put forth a new theoretical framework for merging quantum information science and causal inference by exploiting entropic principles. For this purpose, we leverage the concept of conditional density matrices to develop a scalable algorithmic approach for inferring causality in the presence of latent confounders (common causes) in quantum systems. We apply our proposed framework to an experimentally relevant scenario of identifying message senders on quantum noisy links, where it is validated that the input before noise as a latent confounder is the cause of the noisy outputs. We also demonstrate that the proposed approach outperforms the results of classical causal inference even when the variables are classical by exploiting quantum dependence between variables through density matrices rather than joint probability distributions. Thus, the proposed approach unifies classical and quantum causal inference in a principled way. This successful inference on a synthetic quantum dataset can lay the foundations of identifying originators of malicious activity on future multi-node quantum networks.
Explainable Artificial Intelligence Reveals Novel Insight into Tumor Microenvironment Conditions Linked with Better Prognosis in Patients with Breast Cancer
Chakraborty, Debaditya, Ivan, Cristina, Amero, Paola, Khan, Maliha, Rodriguez-Aguayo, Cristian, Başağaoğlu, Hakan, Lopez-Berestein, Gabriel
We investigated the data-driven relationship between features in the tumor microenvironment (TME) and the overall and 5-year survival in triple-negative breast cancer (TNBC) and non-TNBC (NTNBC) patients by using Explainable Artificial Intelligence (XAI) models. We used clinical information from patients with invasive breast carcinoma from The Cancer Genome Atlas and from two studies from the cbioPortal, the PanCanAtlas project and the GDAC Firehose study. In this study, we used a normalized RNA sequencing data-driven cohort from 1,015 breast cancer patients, alive or deceased, from the UCSC Xena data set and performed integrated deconvolution with the EPIC method to estimate the percentage of seven different immune and stromal cells from RNA sequencing data. Novel insights derived from our XAI model showed that CD4+ T cells and B cells are more critical than other TME features for enhanced prognosis for both TNBC and NTNBC patients. Our XAI model revealed the critical inflection points (i.e., threshold fractions) of CD4+ T cells and B cells above or below which 5-year survival rates improve. Subsequently, we ascertained the conditional probabilities of $\geq$ 5-year survival in both TNBC and NTNBC patients under specific conditions inferred from the inflection points. In particular, the XAI models revealed that a B-cell fraction exceeding 0.018 in the TME could ensure 100% 5-year survival for NTNBC patients. The findings from this research could lead to more accurate clinical predictions and enhanced immunotherapies and to the design of innovative strategies to reprogram the TME of breast cancer patients.
Selecting a number of voters for a voting ensemble
For a voting ensemble that selects an odd-sized subset of the ensemble classifiers at random for each example, applies them to the example, and returns the majority vote, we show that any number of voters may minimize the error rate over an out-of-sample distribution. The optimal number of voters depends on the out-of-sample distribution of the number of classifiers in error. To select a number of voters to use, estimating that distribution then inferring error rates for numbers of voters gives lower-variance estimates than directly estimating those error rates.
Knowledge Triggering, Extraction and Storage via Human-Robot Verbal Interaction
Grassi, Lucrezia, Recchiuto, Carmine Tommaso, Sgorbissa, Antonio
This article describes a novel approach to expand in run-time the knowledge base of an Artificial Conversational Agent. A technique for automatic knowledge extraction from the user's sentence and four methods to insert the new acquired concepts in the knowledge base have been developed and integrated into a system that has already been tested for knowledge-based conversation between a social humanoid robot and residents of care homes. The run-time addition of new knowledge allows overcoming some limitations that affect most robots and chatbots: the incapability of engaging the user for a long time due to the restricted number of conversation topics. The insertion in the knowledge base of new concepts recognized in the user's sentence is expected to result in a wider range of topics that can be covered during an interaction, making the conversation less repetitive. Two experiments are presented to assess the performance of the knowledge extraction technique, and the efficiency of the developed insertion methods when adding several concepts in the Ontology.