Accuracy
Joint Optimization of AI Fairness and Utility: A Human-Centered Approach
Zhang, Yunfeng, Bellamy, Rachel K. E., Varshney, Kush R.
Today, AI is increasingly being used in many high-stakes decision-making applications in which fairness is an important concern. Already, there are many examples of AI being biased and making questionable and unfair decisions. The AI research community has proposed many methods to measure and mitigate unwanted biases, but few of them involve inputs from human policy makers. We argue that because different fairness criteria sometimes cannot be simultaneously satisfied, and because achieving fairness often requires sacrificing other objectives such as model accuracy, it is key to acquire and adhere to human policy makers' preferences on how to make the tradeoff among these objectives. In this paper, we propose a framework and some exemplar methods for eliciting such preferences and for optimizing an AI model according to these preferences.
A Generalized Flow for B2B Sales Predictive Modeling: An Azure Machine Learning Approach
-- Predicting s ales opportunities outcome is a core to successful business management and revenue forecasting . Conventionally, this prediction has relied mostly on subjective human evaluations in the process of business to business (B2B) sales decision making. Here, we proposed a practical Machine Learning (ML) workflow to empower B2B sales outcome (win/lose) pre diction within a cloud - based computing platform: Microsoft Azure Machine Learning Service (Azure ML). This workflow consists of two pipelines: 1) a n ML pipeline that trains probabilistic predictive models in parallel on the closed sales opportunities data enhanced with an extensive feature engineering procedure for automated selection and parameterization of an optimal ML model and 2) a Prediction pipeline that uses the optimal ML model to estimate the likelihood of win n ing new sales opportunities as well a s predicting their outcome using optimized decision boundaries. The p erformance of the proposed workflow was evaluated on a real sales dataset of a B2B consulting firm. In the Business to Business (B2B) commerce, companies compete to win high - valued sales opportunities to maximize their profitability. In this regard, a key factor for maintain ing a successful B2B business is the task of determining the outcome of sales opportunities.
Bootstrapping a DQN Replay Memory with Synthetic Experiences
von Pilchau, Wenzel Baron Pilar, Stein, Anthony, Hรคhner, Jรถrg
An important component of many Deep Reinforcement Learning algorithms is the Experience Replay which serves as a storage mechanism or memory of made experiences. These experiences are used for training and help the agent to stably find the perfect trajectory through the problem space. The classic Experience Replay however makes only use of the experiences it actually made, but the stored samples bear great potential in form of knowledge about the problem that can be extracted. We present an algorithm that creates synthetic experiences in a nondeterministic discrete environment to assist the learner. The Interpolated Experience Replay is evaluated on the FrozenLake environment and we show that it can support the agent to learn faster and even better than the classic version.
End-to-End Models for the Analysis of System 1 and System 2 Interactions based on Eye-Tracking Data
Rossi, Alessandro, Ermini, Sara, Bernabini, Dario, Zanca, Dario, Todisco, Marino, Genovese, Alessandro, Rizzo, Antonio
While theories postulating a dual cognitive system take hold, quantitative confirmations are still needed to understand and identify interactions between the two systems or conflict events. Eye movements are among the most direct markers of the individual attentive load and may serve as an important proxy of information. In this work we propose a computational method, within a modified visual version of the well-known Stroop test, for the identification of different tasks and potential conflicts events between the two systems through the collection and processing of data related to eye movements. A statistical analysis shows that the selected variables can characterize the variation of attentive load within different scenarios. Moreover, we show that Machine Learning techniques allow to distinguish between different tasks with a good classification accuracy and to investigate more in depth the gaze dynamics.
A neural network model that learns differences in diagnosis strategies among radiologists has an improved area under the curve for aneurysm status classification in magnetic resonance angiography image series
Tachibana, Yasuhiko, Nishimori, Masataka, Kitamura, Naoyuki, Umehara, Kensuke, Ota, Junko, Obata, Takayuki, Higashi, Tatsuya
Purpose: To construct a neural network model that can learn the different diagnosing strategies of radiologists to better classify aneurysm status in magnetic resonance angiography images. Materials and methods: This retrospective study included 3423 time-of-flight brain magnetic resonance angiography image series (subjects: male 1843 [mean age, 50.2 +/- 11.7 years], female 1580 [50.8 +/- 11.3 years]) recorded from November 2017 through January 2019. The image series were read independently for aneurysm status by one of four board-certified radiologists, who were assisted by an established deep learning-based computer-assisted diagnosis (CAD) system. The constructed neural networks were trained to classify the aneurysm status of zero to five aneurysm-suspicious areas suggested by the CAD system for each image series, and any additional aneurysm areas added by the radiologists, and this classification was compared with the judgment of the annotating radiologist. Image series were randomly allocated to training and testing data in an 8:2 ratio. The accuracy of the classification was compared by receiver operating characteristic analysis between the control model that accepted only image data as input and the proposed model that additionally accepted the information of who the annotating radiologist was. The DeLong test was used to compare areas under the curves (P < 0.05 was considered significant). Results: The area under the curve was larger in the proposed model (0.845) than in the control model (0.793), and the difference was significant (P < 0.0001). Conclusion: The proposed model improved classification accuracy by learning the diagnosis strategies of individual annotating radiologists.
FAE: A Fairness-Aware Ensemble Framework
Iosifidis, Vasileios, Fetahu, Besnik, Ntoutsi, Eirini
Automated decision making based on big data and machine learning (ML) algorithms can result in discriminatory decisions against certain protected groups defined upon personal data like gender, race, sexual orientation etc. Such algorithms designed to discover patterns in big data might not only pick up any encoded societal biases in the training data, but even worse, they might reinforce such biases resulting in more severe discrimination. The majority of thus far proposed fairness-aware machine learning approaches focus solely on the pre-, in- or post-processing steps of the machine learning process, that is, input data, learning algorithms or derived models, respectively. However, the fairness problem cannot be isolated to a single step of the ML process. Rather, discrimination is often a result of complex interactions between big data and algorithms, and therefore, a more holistic approach is required. The proposed FAE (Fairness-Aware Ensemble) framework combines fairness-related interventions at both pre- and postprocessing steps of the data analysis process. In the preprocessing step, we tackle the problems of under-representation of the protected group (group imbalance) and of class-imbalance by generating balanced training samples. In the post-processing step, we tackle the problem of class overlapping by shifting the decision boundary in the direction of fairness.
Detection of Obstructive Sleep Apnoea Using Features Extracted from Segmented Time-Series ECG Signals Using a One Dimensional Convolutional Neural Network
Thompson, Steven, Fergus, Paul, Chalmers, Carl, Reilly, Denis
Steven Thompson Computer Science Liverpool John Moores University Liverpool, Merseyside S.R.Thompson@LJMU.AC.UK Denis Reilly Computer Science Liverpool John Moores University Liverpool, Merseyside D.Reilly@LJMU.AC.UK Paul Fergus Computer Science Liverpool John Moores University Liverpool, Merseysde P.Fergus@LJMU.AC.UK Carl Chalmers Computer Science Liverpool John Moores University Liverpool, Merseyside C.Chalmers@LJMU.AC.UK Abstract --The study in this paper presents a one-dimensional convolutional neural network (1DCNN) model, designed for the automated detection of obstructive Sleep Apnoea (OSA) captured from single-channel electrocardiogram (ECG) signals. The system provides mechanisms in clinical practice that help diagnose patients suffering with OSA. Using the state-of-the-art in 1DCNNs, a model is constructed using convolutional, max pooling layers and a fully connected Multilayer Perceptron (MLP) consisting of a hidden layer and SoftMax output for classification. The 1DCNN extracts prominent features, which are used to train an MLP. The model is trained using segmented ECG signals grouped into 5 unique datasets of set window sizes. A total of 6514 minutes of Apnoea was recorded. Evaluation of the model is performed using a set of standard metrics which show the proposed model achieves high classification results in both training and validation using our windowing strategy, particularly W 500 (Sensitivity 0.9705, Specificity 0.9725, F1_Score 0.9717, Kappa_Score 0.9430, Log_Loss 0.0836, ROCAUC 0.9945). This demonstrates the model can identify the presence of Apnoea with a high degree of accuracy.
Combating False Negatives in Adversarial Imitation Learning
Zolna, Konrad, Saharia, Chitwan, Boussioux, Leonard, Hui, David Yu-Tung, Chevalier-Boisvert, Maxime, Bahdanau, Dzmitry, Bengio, Yoshua
In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the 'False Negatives' (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.
Dialogue-based simulation for cultural awareness training
Adewole, Sodiq, Gharavi, Erfaneh, Shpringer, Benjamin, Bolger, Martin, Sharma, Vaibhav, Yang, Sung Ming, Brown, Donald E.
Existing simulations designed for cultural and interpersonal skill training rely on pre-defined responses with a menu option selection interface. Using a multiple-choice interface and restricting trainees' responses may limit the trainees' ability to apply the lessons in real life situations. This systems also uses a simplistic evaluation model, where trainees' selected options are marked as either correct or incorrect. This model may not capture sufficient information that could drive an adaptive feedback mechanism to improve trainees' cultural awareness. This paper describes the design of a dialogue-based simulation for cultural awareness training. The simulation, built around a disaster management scenario involving a joint coalition between the US and the Chinese armies. Trainees were able to engage in realistic dialogue with the Chinese agent. Their responses, at different points, get evaluated by different multi-label classification models. Based on training on our dataset, the models score the trainees' responses for cultural awareness in the Chinese culture. Trainees also get feedback that informs the cultural appropriateness of their responses. The result of this work showed the following; i) A feature-based evaluation model improves the design, modeling and computation of dialogue-based training simulation systems; ii) Output from current automatic speech recognition (ASR) systems gave comparable end results compared with the output from manual transcription; iii) A multi-label classification model trained as a cultural expert gave results which were comparable with scores assigned by human annotators.
Multi-stream Faster RCNN for Mitosis Counting in Breast Cancer Images
Mitotic count is a commonly used method to assess the level of progression of breast cancer, which is now the fourth most prevalent cancer. Unfortunately, counting mitosis is a tedious and subjective task with poor reproducibility, especially for non-experts. Luckily, since the machine can read and compare more data with greater efficiency this could be the next modern technique to count mitosis. Furthermore, technological advancements in medicine have led to the increase in image data available for use in training. In this work, we propose a network constructed using a similar approach to one that has been used for image fraud detection with the segmented image map as the second stream input to Faster RCNN. This region-based detection model combines a fully convolutional Region Proposal Network to generate proposals and a classification network to classify each of these proposals as containing mitosis or not. Features from both streams are fused in the bilinear pooling layer to maintain the spatial concurrence of each. After training this model on the ICPR 2014 MITOSIS contest dataset, we received an F-measure score of 0.507, higher than both the winners score and scores from recent tests on the same data. Our method is clinically applicable, taking only around five min per ten full High Power Field slides when tested on a Quadro P6000 cloud GPU.