Goto

Collaborating Authors

 Accuracy


Beyond triplet loss : One shot learning experiments with quadruplet loss

#artificialintelligence

This article is a follow up to my previous article about One Shot learning, Siamese networks and Triplet Loss with Keras. "One Shot Learning" and "Mining" are described there, so if you're not familiar with these concepts yet, I highly recommend you read that first. A friend of mine says that, to make significant progress in machine learning, one should read research papers on the field. While browsing research papers, I found this one "Beyond triplet loss: a deep quadruplet network for person re-identification" that seemed to be a source of improvement over my previous work and I decided to try to recreate what they have done but for my particular case. This article is about exploring the paper and implementing some of the concepts in the research paper with Keras.


Essential Math for Data Science: Integrals And Area Under The Curve - KDnuggets

#artificialintelligence

Calculus is a branch of mathematics that gives tools to study the rate of change of functions through two main areas: derivatives and integrals. In the context of machine learning and data science, you might use integrals to calculate the area under the curve (for instance, to evaluate the performance of a model with the ROC curve, or to calculate probability from densities. In this article, you'll learn about integrals and the area under the curve using the practical data science example of the area under the ROC curve used to compare the performances of two machine learning models. Building from this example, you'll see the notion of the area under the curve and integrals from a mathematical point of view (from my book Essential Math for Data Science). Let's say that you would like to predict the quality of wines from various of their chemical properties. You want to do a binary classification of the quality (distinguishing very good wines from not very good ones). You'll develop methods allowing you to evaluate your models considering imbalanced data with the area under the Receiver Operating Characteristics (ROC) curve.


Teaching the Machine to Explain Itself using Domain Knowledge

arXiv.org Artificial Intelligence

Machine Learning (ML) has been increasingly used to aid humans to make better and faster decisions. However, non-technical humans-in-the-loop struggle to comprehend the rationale behind model predictions, hindering trust in algorithmic decision-making systems. Considerable research work on AI explainability attempts to win back trust in AI systems by developing explanation methods but there is still no major breakthrough. At the same time, popular explanation methods (e.g., LIME, and SHAP) produce explanations that are very hard to understand for non-data scientist persona. To address this, we present JOEL, a neural network-based framework to jointly learn a decision-making task and associated explanations that convey domain knowledge. JOEL is tailored to human-in-the-loop domain experts that lack deep technical ML knowledge, providing high-level insights about the model's predictions that very much resemble the experts' own reasoning. Moreover, we collect the domain feedback from a pool of certified experts and use it to ameliorate the model (human teaching), hence promoting seamless and better suited explanations. Lastly, we resort to semantic mappings between legacy expert systems and domain taxonomies to automatically annotate a bootstrap training set, overcoming the absence of concept-based human annotations. We validate JOEL empirically on a real-world fraud detection dataset. We show that JOEL can generalize the explanations from the bootstrap dataset. Furthermore, obtained results indicate that human teaching can further improve the explanations prediction quality by approximately $13.57\%$.


Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity

Science

Among the coronaviruses that infect humans, four cause mild common colds, whereas three others, including the currently circulating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), result in severe infections. Shrock et al. used a technology known as VirScan to probe the antibody repertoires of hundreds of coronavirus disease 2019 (COVID-19) patients and pre–COVID-19 era controls. They identified hundreds of antibody targets, including several antibody epitopes shared by the mild and severe coronaviruses and many specific to SARS-CoV-2. A machine-learning model accurately classified patients infected with SARS-CoV-2 and guided the design of an assay for rapid SARS-CoV-2 antibody detection. The study also looked at how the antibody response and viral exposure history differ in patients with diverging outcomes, which could inform the production of improved vaccine and antibody therapies. Science , this issue p. [eabd4250][1] ### INTRODUCTION A systematic characterization of the humoral response to severe acute respiratory system coronavirus 2 (SARS-CoV-2) epitopes has yet to be performed. This analysis is important for understanding the immunogenicity of the viral proteome and the basis for cross-reactivity with the common-cold coronaviruses. Coronavirus disease 2019 (COVID-19), caused by SARS-CoV-2, is notable for its variable course, with some individuals remaining asymptomatic whereas others experience fever, respiratory distress, or even death. A comprehensive investigation of the antibody response in individuals with severe versus mild COVID-19—as well as an examination of past viral exposure history—is needed. ### RATIONALE An understanding of humoral responses to SARS-CoV-2 is critical for improving diagnostics and vaccines and gaining insight into variable clinical outcomes. To this end, we used VirScan, a high-throughput method to analyze epitopes of antiviral antibodies in human sera. We supplemented the original VirScan library with additional libraries of peptides spanning the proteomes of SARS-CoV-2 and all other human coronaviruses. These libraries enabled us to precisely map epitope locations and investigate cross-reactivity between SARS-CoV-2 and other coronavirus strains. The original VirScan library allowed us to simultaneously investigate antibody responses to prior infections and viral exposure history. ### RESULTS We screened sera from 232 COVID-19 patients and 190 pre–COVID-19 era controls against the original VirScan and supplemental coronavirus libraries, assaying more than 108 antibody repertoire–peptide interactions. We identified epitopes ranging from “private” (recognized by antibodies in only a small number of individuals) to “public” (recognized by antibodies in many individuals) and detected SARS-CoV-2–specific epitopes as well as those that cross-react with common-cold coronaviruses. Several of these cross-reacting antibodies are present in pre–COVID-19 era samples. We developed a machine learning model that predicted SARS-CoV-2 exposure history with 99% sensitivity and 98% specificity from VirScan data. We used the most discriminatory SARS-CoV-2 peptides to produce a Luminex-based serological assay, which performed similarly to gold-standard enzyme-linked immunosorbent assays. We stratified the COVID-19 patient samples by disease severity and found that patients who had required hospitalization exhibited stronger and broader antibody responses to SARS-CoV-2 but weaker overall responses to past infections compared with those who did not need hospitalization. Further, the hospitalized group had higher seroprevalence rates for cytomegalovirus and herpes simplex virus 1. These findings may be influenced by differences in demographic compositions between the two groups, but they raise hypotheses that may be tested in future studies. Using alanine scanning mutagenesis, we precisely mapped 823 distinct epitopes across the entire SARS-CoV-2 proteome, 10 of which are likely targets of neutralizing antibodies. One cross-reactive antibody epitope in S2 has been previously suggested to be neutralizing and, as it exists in pre–COVID-19 era samples, could affect the severity of COVID-19. ### CONCLUSION We present a highly detailed view of the epitope landscape within the SARS-CoV-2 proteome. This knowledge may be used to produce diagnostics with improved specificity and can provide a stepping stone to the isolation and functional dissection of both neutralizing antibodies and antibodies that might exacerbate patient outcomes through antibody-dependent enhancement or immune distraction. Our study reveals notable correlations between COVID-19 severity and both viral exposure history and overall strength of the antibody response to past infections. These findings are likely influenced by demographic covariates, but they generate hypotheses that may be tested with larger patient cohorts matched for age, gender, race, and other demographic variables. ![Figure][2] SARS-CoV-2 epitope mapping. VirScan detects antibodies against SARS-CoV-2 in COVID-19 patients with severe and mild disease. Heatmap color represents the strength of the antibody response in each sample (columns) to each protein (rows, left) or peptide (rows, right). VirScan reveals the precise positions of epitopes, which can be mapped onto the structure of the spike protein (S). Examination of SARS-CoV-2 and seasonal coronavirus sequence conservation explains epitope cross-reactivity. A, Ala; D, Asp; E, Glu; F, Phe; I, Ile; K, Lys; L, Leu; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; Y, Tyr. Understanding humoral responses to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is critical for improving diagnostics, therapeutics, and vaccines. Deep serological profiling of 232 coronavirus disease 2019 (COVID-19) patients and 190 pre–COVID-19 era controls using VirScan revealed more than 800 epitopes in the SARS-CoV-2 proteome, including 10 epitopes likely recognized by neutralizing antibodies. Preexisting antibodies in controls recognized SARS-CoV-2 ORF1, whereas only COVID-19 patient antibodies primarily recognized spike protein and nucleoprotein. A machine learning model trained on VirScan data predicted SARS-CoV-2 exposure history with 99% sensitivity and 98% specificity; a rapid Luminex-based diagnostic was developed from the most discriminatory SARS-CoV-2 peptides. Individuals with more severe COVID-19 exhibited stronger and broader SARS-CoV-2 responses, weaker antibody responses to prior infections, and higher incidence of cytomegalovirus and herpes simplex virus 1, possibly influenced by demographic covariates. Among hospitalized patients, males produce stronger SARS-CoV-2 antibody responses than females. [1]: /lookup/doi/10.1126/science.abd4250 [2]: pending:yes


Practical Machine Learning Tutorial: Part.4 (Model Evaluation-2)

#artificialintelligence

In this part, we will elaborate on more model evaluation metrics specifically for multi-class classification problems. Learning curves will be discussed as a tool to come up with an idea of how to trade-off between bias and variance in the model parameter selection. ROC curves for all classes in a specific model will be shown to see how false and true positive rate varies through the modeling process. Finally, we will select the best model and examine its performance on blind well data(data that was not involved in any of the processes up to now). This post is the fourth part(final) of part1, part2, part3.


Concordia University coronavirus 'outbreak' attributed to more than 50 'false positives'

Los Angeles Times

Concordia University in Irvine will discontinue its use of antigen testing for asymptomatic students and employees, after more than 50 false positives prompted unwarranted concern about a possible major coronavirus outbreak. As of Wednesday, university officials said there were six active cases -- four students and two employees -- on campus as opposed to the more than 60 infections reported two days ago. Testing in another six cases has not been confirmed, and 55 students and employees have been confirmed as negative for the virus, they said. Campus officials had canceled athletic practices and urged against out-of-state travel for Thanksgiving because of the erroneous test results, which were preliminary pending confirmation from an outside lab. The university previously had been posting only confirmed test results on its COVID-19 dashboard, but made an exception for the unconfirmed numbers because of the indication of a "potential outbreak."


PSD2 Explainable AI Model for Credit Scoring

arXiv.org Artificial Intelligence

The aim of this paper is to develop and test advanced analytical methods to improve the prediction accuracy of Credit Risk Models, preserving at the same time the model interpretability. In particular, the project focuses on applying an explainable machine learning model to PSD2-related databases. The input data were obtained solely from synthetic account transactions generated from a pool of commercial banks from a pool of Italian commercial banks. Over the total proven models, CatBoost has shown the highest performance. The algorithm implementation produces a GINI of 0.45 after tuning the hyper-parameters combined with their inherent class-weight resampling method. SHAP package is used to provide a global and local interpretation of the model predictions to formulate a human-comprehensive approach to understanding the decision-maker algorithm. The 20 most important features are selected using the Shapley values to present a full human-understandable model that reveals how the attributes of an individual are related to its model prediction.


Artificial Intelligence for COVID-19 Detection -- A state-of-the-art review

arXiv.org Artificial Intelligence

The emergence of COVID-19 has necessitated many efforts by the scientific community for its proper management. An urgent clinical reaction is required in the face of the unending devastation being caused by the pandemic. These efforts include technological innovations for improvement in screening, treatment, vaccine development, contact tracing and, survival prediction. The use of Deep Learning (DL) and Artificial Intelligence (AI) can be sought in all of the above-mentioned spheres. This paper aims to review the role of Deep Learning and Artificial intelligence in various aspects of the overall COVID-19 management and particularly for COVID-19 detection and classification. The DL models are developed to analyze clinical modalities like CT scans and X-Ray images of patients and predict their pathological condition. A DL model aims to detect the COVID-19 pneumonia, classify and distinguish between COVID-19, Community-Acquired Pneumonia (CAP), Viral and Bacterial pneumonia, and normal conditions. Furthermore, sophisticated models can be built to segment the affected area in the lungs and quantify the infection volume for a better understanding of the extent of damage. Many models have been developed either independently or with the help of pre-trained models like VGG19, ResNet50, and AlexNet leveraging the concept of transfer learning. Apart from model development, data preprocessing and augmentation are also performed to cope with the challenge of insufficient data samples often encountered in medical applications. It can be evaluated that DL and AI can be effectively implemented to withstand the challenges posed by the global emergency


RRCN: A Reinforced Random Convolutional Network based Reciprocal Recommendation Approach for Online Dating

arXiv.org Artificial Intelligence

Recently, the reciprocal recommendation, especially for online dating applications, has attracted more and more research attention. Different from conventional recommendation problems, the reciprocal recommendation aims to simultaneously best match users' mutual preferences. Intuitively, the mutual preferences might be affected by a few key attributes that users like or dislike. Meanwhile, the interactions between users' attributes and their key attributes are also important for key attributes selection. Motivated by these observations, in this paper we propose a novel reinforced random convolutional network (RRCN) approach for the reciprocal recommendation task. In particular, we technically propose a novel random CNN component that can randomly convolute non-adjacent features to capture their interaction information and learn feature embeddings of key attributes to make the final recommendation. Moreover, we design a reinforcement learning based strategy to integrate with the random CNN component to select salient attributes to form the candidate set of key attributes. We evaluate the proposed RRCN against a number of both baselines and the state-of-the-art approaches on two real-world datasets, and the promising results have demonstrated the superiority of RRCN against the compared approaches in terms of a number of evaluation criteria.


Classification supporting COVID-19 diagnostics based on patient survey data

arXiv.org Artificial Intelligence

Distinguishing COVID-19 from other flu-like illnesses can be difficult due to ambiguous symptoms and still an initial experience of doctors. Whereas, it is crucial to filter out those sick patients who do not need to be tested for SARS-CoV-2 infection, especially in the event of the overwhelming increase in disease. As a part of the presented research, logistic regression and XGBoost classifiers, that allow for effective screening of patients for COVID-19, were generated. Each of the methods was tuned to achieve an assumed acceptable threshold of negative predictive values during classification. Additionally, an explanation of the obtained classification models was presented. The explanation enables the users to understand what was the basis of the decision made by the model. The obtained classification models provided the basis for the DECODE service (decode.polsl.pl), which can serve as support in screening patients with COVID-19 disease. Moreover, the data set constituting the basis for the analyses performed is made available to the research community. This data set consisting of more than 3,000 examples is based on questionnaires collected at a hospital in Poland.