Accuracy
Bias Mitigation of Face Recognition Models Through Calibration
Salvador, Tiago, Cairns, Stephanie, Voleti, Vikram, Marshall, Noah, Oberman, Adam
Face recognition models suffer from bias: for example, the probability of a false positive (incorrect face match) strongly depends on sensitive attributes like ethnicity. As a result, these models may disproportionately and negatively impact minority groups when used in law enforcement. In this work, we introduce the Bias Mitigation Calibration (BMC) method, which (i) increases model accuracy (improving the state-of-the-art), (ii) produces fairly-calibrated probabilities, (iii) significantly reduces the gap in the false positive rates, and (iv) does not require knowledge of the sensitive attribute.
Ethical AI, Monetizing False Negatives and Growing Total Addressable Market
What if I told you that companies that don't embrace Ethical AI are leaving significant amounts of "Money on the Table"; that they are not only missing out on potentially profitable customers, but that over time they are eroding their Total Addressable Market (TAM)? Do I have your attention now? After I published the blog "The Ethical AI Application Pyramid", a question from Karrie Sullivan coupled with a mentoring session with the startup unfog.ai "If your AI model doesn't take into consideration the ultimate outcomes of the AI model's False Negatives, then confirmation bias in the AI model could set in and eventually the company's Total Addressable Market (TAM) could shrink to a point where the business might no longer be viable." Yea, not only is Ethical AI the right thing to do from a cultural and society perspective, but there are direct bottom-line financial ramifications if your AI models are not learning and adapting from the AI model's False Negatives.
Confusion Matrix In Cyber Security
In today's article I'm going to explain all about Intrusion detection system in cyber security, confusion matrix, how it is used in IDS, how it is impacting in cyber security with example .So let's get started to this amazing topic. In today's technological world where everything is going to digitalized everything is online now. Along with this the most important thing is data and data security. All activities we do on internet, what we searched,what we post, what we buy, which site we visited all this data is stored in datacenters servers. This all data must be secured from hackers and any kind of data loss.
Cybersecurity: When we talk about the confusion matrix
Confusion Matrix The Confusion Matrix is a table that summarizes the number of true and false predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to assess the performance of a classification model by calculating performance indicators such as accuracy, precision, recall, and F1 score. If you are working with an unbalanced dataset, you had better use the confusion matrix as the endpoint for your machine learning model. Here are the basic terms that will help us identify the metrics we are looking for.
Virtual Screening of Pharmaceutical Compounds with hERG Inhibitory Activity (Cardiotoxicity) using Ensemble Learning
Sarkar, Aditya, Bhavsar, Arnav
In silico prediction of cardiotoxicity with high sensitivity and specificity for potential drug molecules can be of immense value. Hence, building machine learning classification models, based on some features extracted from the molecular structure of drugs, which are capable of efficiently predicting cardiotoxicity is critical. In this paper, we consider the application of various machine learning approaches, and then propose an ensemble classifier for the prediction of molecular activity on a Drug Discovery Hackathon (DDH) (1st reference) dataset. We have used only 2-D descriptors of SMILE notations for our prediction. Our ensemble classification uses 5 classifiers (2 Random Forest Classifiers, 2 Support Vector Machines and a Dense Neural Network) and uses Max-Voting technique and Weighted-Average technique for final decision. Introduction It is well known that drug discovery is complex, long drawn, and requires interdisciplinary expertise to discover new molecules. Drug safety is an important issue in the process of drug discovery. Failure in clinical trials in the 2000s was majorly due to efficacy and safety (approx 30%) (Kola, I. and Landis, J., 2004). One important aspect of drug safety is drug toxicity.
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
Meng, Zhong, Wu, Yu, Kanda, Naoyuki, Lu, Liang, Chen, Xie, Ye, Guoli, Sun, Eric, Li, Jinyu, Gong, Yifan
Integrating external language models (LMs) into end-to-end (E2E) models remains a challenging task for domain-adaptive speech recognition. Recently, internal language model estimation (ILME)-based LM fusion has shown significant word error rate (WER) reduction from Shallow Fusion by subtracting a weighted internal LM score from an interpolation of E2E model and external LM scores during beam search. However, on different test sets, the optimal LM interpolation weights vary over a wide range and have to be tuned extensively on well-matched validation sets. In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference. Besides MWER training with Shallow Fusion (MWER-SF), we propose a novel MWER training with ILME (MWER-ILME) where the ILME-based fusion is conducted to generate N-best hypotheses and their posteriors. Additional gradient is induced when internal LM is engaged in MWER-ILME loss computation. During inference, LM weights pre-determined in MWER training enable robust LM integrations on test sets from different domains. Experimented with 30K-hour trained transformer transducers, MWER-ILME achieves on average 8.8% and 5.8% relative WER reductions from MWER and MWER-SF training, respectively, on 6 different test sets
Spatially relaxed inference on high-dimensional linear models
Chevalier, Jérôme-Alexis, Nguyen, Tuan-Binh, Thirion, Bertrand, Salmon, Joseph
We consider the inference problem for high-dimensional linear models, when covariates have an underlying spatial organization reflected in their correlation. A typical example of such a setting is high-resolution imaging, in which neighboring pixels are usually very similar. Accurate point and confidence intervals estimation is not possible in this context with many more covariates than samples, furthermore with high correlation between covariates. This calls for a reformulation of the statistical inference problem, that takes into account the underlying spatial structure: if covariates are locally correlated, it is acceptable to detect them up to a given spatial uncertainty. We thus propose to rely on the $\delta$-FWER, that is the probability of making a false discovery at a distance greater than $\delta$ from any true positive. With this target measure in mind, we study the properties of ensembled clustered inference algorithms which combine three techniques: spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions. We show that ensembled clustered inference algorithms control the $\delta$-FWER under standard assumptions for $\delta$ equal to the largest cluster diameter. We complement the theoretical analysis with empirical results, demonstrating accurate $\delta$-FWER control and decent power achieved by such inference algorithms.
Ethical AI: Demographic Bias in Facial Recognition Technology
There is a tremendous amount of misleading and inaccurate reporting on the topic of demographic bias in biometric identification systems, especially regarding facial recognition technology. Part of the problem is that there isn't one thing that is "facial recognition technology". At the core of any biometric system is a matching algorithm. The definitive resource on the topic of demographic bias in biometrics is the NIST Face Recognition Vendor Test (FRVT) Part 3 Demographic Effects report. Warning: this 82-page report is not an easy read and you really should read parts 1 and 2 first to get the context.
AI based monitoring and decision making solutions
Based on our approach we implemented PCA, Isolation Forest and Autoencoder based anomaly detection models to identify anomalies from the good service. Hence there is a chance of high false-positive cases if we depend on a single model, we implemented two models in production and took cumulative inference for decision making. The time-series call data forecasting was achieved by training an LSTM model on historical volume data and to forecast for the desired time in the future. Notably, these two solutions are real-time which required a high level of optimization to accommodate the high frequency of incoming data. We deployed the models using Kubernetes and OKD deployment frameworks coupled with NVidia GPUs for high-performance model training.
Heart Sound Classification Considering Additive Noise and Convolutional Distortion
Azam, Farhat Binte, Ansari, Md. Istiaq, Mclane, Ian, Hasan, Taufiq
Cardiac auscultation is an essential point-of-care method used for the early diagnosis of heart diseases. Automatic analysis of heart sounds for abnormality detection is faced with the challenges of additive noise and sensor-dependent degradation. This paper aims to develop methods to address the cardiac abnormality detection problem when both types of distortions are present in the cardiac auscultation sound. We first mathematically analyze the effect of additive and convolutional noise on short-term filterbank-based features and a Convolutional Neural Network (CNN) layer. Based on the analysis, we propose a combination of linear and logarithmic spectrogram-image features. These 2D features are provided as input to a residual CNN network (ResNet) for heart sound abnormality detection. Experimental validation is performed on an open-access heart sound abnormality detection dataset involving noisy recordings obtained from multiple stethoscope sensors. The proposed method achieves significantly improved results compared to the conventional approaches, with an area under the ROC (receiver operating characteristics) curve (AUC) of 91.36%, F-1 score of 84.09%, and Macc (mean of sensitivity and specificity) of 85.08%. We also show that the proposed method shows the best mean accuracy across different source domains including stethoscope and noise variability, demonstrating its effectiveness in different recording conditions. The proposed combination of linear and logarithmic features along with the ResNet classifier effectively minimizes the impact of background noise and sensor variability for classifying phonocardiogram (PCG) signals. The proposed method paves the way towards developing computer-aided cardiac auscultation systems in noisy environments using low-cost stethoscopes.