Goto

Collaborating Authors

 Performance Analysis


Learning a metacognition for object perception

arXiv.org Artificial Intelligence

Beyond representing the external world, humans also represent their own cognitive processes. In the context of perception, this metacognition helps us identify unreliable percepts, such as when we recognize that we are seeing an illusion. Here we propose MetaGen, a model for the unsupervised learning of metacognition. In MetaGen, metacognition is expressed as a generative model of how a perceptual system produces noisy percepts. Using basic principles of how the world works (such as object permanence, part of infants' core knowledge), MetaGen jointly infers the objects in the world causing the percepts and a representation of its own perceptual system. MetaGen can then use this metacognition to infer which objects are actually present in the world. On simulated data, we find that MetaGen quickly learns a metacognition and improves overall accuracy, outperforming models that lack a metacognition.


Driver Behavior Extraction from Videos in Naturalistic Driving Datasets with 3D ConvNets

arXiv.org Artificial Intelligence

Naturalistic driving data (NDD) is an important source of information to understand crash causation and human factors and to further develop crash avoidance countermeasures. Videos recorded while driving are often included in such datasets. While there is often a large amount of video data in NDD, only a small portion of them can be annotated by human coders and used for research, which underuses all video data. In this paper, we explored a computer vision method to automatically extract the information we need from videos. More specifically, we developed a 3D ConvNet algorithm to automatically extract cell-phone-related behaviors from videos. The experiments show that our method can extract chunks from videos, most of which (~79%) contain the automatically labeled cell phone behaviors. In conjunction with human review of the extracted chunks, this approach can find cell-phone-related driver behaviors much more efficiently than simply viewing video.


Towards constraining warm dark matter with stellar streams through neural simulation-based inference

arXiv.org Machine Learning

A statistical analysis of the observed perturbations in the density of stellar streams can in principle set stringent contraints on the mass function of dark matter subhaloes, which in turn can be used to constrain the mass of the dark matter particle. However, the likelihood of a stellar density with respect to the stream and subhaloes parameters involves solving an intractable inverse problem which rests on the integration of all possible forward realisations implicitly defined by the simulation model. In order to infer the subhalo abundance, previous analyses have relied on Approximate Bayesian Computation (ABC) together with domain-motivated but handcrafted summary statistics. Here, we introduce a likelihood-free Bayesian inference pipeline based on Amortised Approximate Likelihood Ratios (AALR), which automatically learns a mapping between the data and the simulator parameters and obviates the need to handcraft a possibly insufficient summary statistic. We apply the method to the simplified case where stellar streams are only perturbed by dark matter subhaloes, thus neglecting baryonic substructures, and describe several diagnostics that demonstrate the effectiveness of the new method and the statistical quality of the learned estimator.


Beyond triplet loss : One shot learning experiments with quadruplet loss

#artificialintelligence

This article is a follow up to my previous article about One Shot learning, Siamese networks and Triplet Loss with Keras. "One Shot Learning" and "Mining" are described there, so if you're not familiar with these concepts yet, I highly recommend you read that first. A friend of mine says that, to make significant progress in machine learning, one should read research papers on the field. While browsing research papers, I found this one "Beyond triplet loss: a deep quadruplet network for person re-identification" that seemed to be a source of improvement over my previous work and I decided to try to recreate what they have done but for my particular case. This article is about exploring the paper and implementing some of the concepts in the research paper with Keras.


Essential Math for Data Science: Integrals And Area Under The Curve - KDnuggets

#artificialintelligence

Calculus is a branch of mathematics that gives tools to study the rate of change of functions through two main areas: derivatives and integrals. In the context of machine learning and data science, you might use integrals to calculate the area under the curve (for instance, to evaluate the performance of a model with the ROC curve, or to calculate probability from densities. In this article, you'll learn about integrals and the area under the curve using the practical data science example of the area under the ROC curve used to compare the performances of two machine learning models. Building from this example, you'll see the notion of the area under the curve and integrals from a mathematical point of view (from my book Essential Math for Data Science). Let's say that you would like to predict the quality of wines from various of their chemical properties. You want to do a binary classification of the quality (distinguishing very good wines from not very good ones). You'll develop methods allowing you to evaluate your models considering imbalanced data with the area under the Receiver Operating Characteristics (ROC) curve.


Teaching the Machine to Explain Itself using Domain Knowledge

arXiv.org Artificial Intelligence

Machine Learning (ML) has been increasingly used to aid humans to make better and faster decisions. However, non-technical humans-in-the-loop struggle to comprehend the rationale behind model predictions, hindering trust in algorithmic decision-making systems. Considerable research work on AI explainability attempts to win back trust in AI systems by developing explanation methods but there is still no major breakthrough. At the same time, popular explanation methods (e.g., LIME, and SHAP) produce explanations that are very hard to understand for non-data scientist persona. To address this, we present JOEL, a neural network-based framework to jointly learn a decision-making task and associated explanations that convey domain knowledge. JOEL is tailored to human-in-the-loop domain experts that lack deep technical ML knowledge, providing high-level insights about the model's predictions that very much resemble the experts' own reasoning. Moreover, we collect the domain feedback from a pool of certified experts and use it to ameliorate the model (human teaching), hence promoting seamless and better suited explanations. Lastly, we resort to semantic mappings between legacy expert systems and domain taxonomies to automatically annotate a bootstrap training set, overcoming the absence of concept-based human annotations. We validate JOEL empirically on a real-world fraud detection dataset. We show that JOEL can generalize the explanations from the bootstrap dataset. Furthermore, obtained results indicate that human teaching can further improve the explanations prediction quality by approximately $13.57\%$.


Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity

Science

Among the coronaviruses that infect humans, four cause mild common colds, whereas three others, including the currently circulating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), result in severe infections. Shrock et al. used a technology known as VirScan to probe the antibody repertoires of hundreds of coronavirus disease 2019 (COVID-19) patients and preโ€“COVID-19 era controls. They identified hundreds of antibody targets, including several antibody epitopes shared by the mild and severe coronaviruses and many specific to SARS-CoV-2. A machine-learning model accurately classified patients infected with SARS-CoV-2 and guided the design of an assay for rapid SARS-CoV-2 antibody detection. The study also looked at how the antibody response and viral exposure history differ in patients with diverging outcomes, which could inform the production of improved vaccine and antibody therapies. Science , this issue p. [eabd4250][1] ### INTRODUCTION A systematic characterization of the humoral response to severe acute respiratory system coronavirus 2 (SARS-CoV-2) epitopes has yet to be performed. This analysis is important for understanding the immunogenicity of the viral proteome and the basis for cross-reactivity with the common-cold coronaviruses. Coronavirus disease 2019 (COVID-19), caused by SARS-CoV-2, is notable for its variable course, with some individuals remaining asymptomatic whereas others experience fever, respiratory distress, or even death. A comprehensive investigation of the antibody response in individuals with severe versus mild COVID-19โ€”as well as an examination of past viral exposure historyโ€”is needed. ### RATIONALE An understanding of humoral responses to SARS-CoV-2 is critical for improving diagnostics and vaccines and gaining insight into variable clinical outcomes. To this end, we used VirScan, a high-throughput method to analyze epitopes of antiviral antibodies in human sera. We supplemented the original VirScan library with additional libraries of peptides spanning the proteomes of SARS-CoV-2 and all other human coronaviruses. These libraries enabled us to precisely map epitope locations and investigate cross-reactivity between SARS-CoV-2 and other coronavirus strains. The original VirScan library allowed us to simultaneously investigate antibody responses to prior infections and viral exposure history. ### RESULTS We screened sera from 232 COVID-19 patients and 190 preโ€“COVID-19 era controls against the original VirScan and supplemental coronavirus libraries, assaying more than 108 antibody repertoireโ€“peptide interactions. We identified epitopes ranging from โ€œprivateโ€ (recognized by antibodies in only a small number of individuals) to โ€œpublicโ€ (recognized by antibodies in many individuals) and detected SARS-CoV-2โ€“specific epitopes as well as those that cross-react with common-cold coronaviruses. Several of these cross-reacting antibodies are present in preโ€“COVID-19 era samples. We developed a machine learning model that predicted SARS-CoV-2 exposure history with 99% sensitivity and 98% specificity from VirScan data. We used the most discriminatory SARS-CoV-2 peptides to produce a Luminex-based serological assay, which performed similarly to gold-standard enzyme-linked immunosorbent assays. We stratified the COVID-19 patient samples by disease severity and found that patients who had required hospitalization exhibited stronger and broader antibody responses to SARS-CoV-2 but weaker overall responses to past infections compared with those who did not need hospitalization. Further, the hospitalized group had higher seroprevalence rates for cytomegalovirus and herpes simplex virus 1. These findings may be influenced by differences in demographic compositions between the two groups, but they raise hypotheses that may be tested in future studies. Using alanine scanning mutagenesis, we precisely mapped 823 distinct epitopes across the entire SARS-CoV-2 proteome, 10 of which are likely targets of neutralizing antibodies. One cross-reactive antibody epitope in S2 has been previously suggested to be neutralizing and, as it exists in preโ€“COVID-19 era samples, could affect the severity of COVID-19. ### CONCLUSION We present a highly detailed view of the epitope landscape within the SARS-CoV-2 proteome. This knowledge may be used to produce diagnostics with improved specificity and can provide a stepping stone to the isolation and functional dissection of both neutralizing antibodies and antibodies that might exacerbate patient outcomes through antibody-dependent enhancement or immune distraction. Our study reveals notable correlations between COVID-19 severity and both viral exposure history and overall strength of the antibody response to past infections. These findings are likely influenced by demographic covariates, but they generate hypotheses that may be tested with larger patient cohorts matched for age, gender, race, and other demographic variables. ![Figure][2] SARS-CoV-2 epitope mapping. VirScan detects antibodies against SARS-CoV-2 in COVID-19 patients with severe and mild disease. Heatmap color represents the strength of the antibody response in each sample (columns) to each protein (rows, left) or peptide (rows, right). VirScan reveals the precise positions of epitopes, which can be mapped onto the structure of the spike protein (S). Examination of SARS-CoV-2 and seasonal coronavirus sequence conservation explains epitope cross-reactivity. A, Ala; D, Asp; E, Glu; F, Phe; I, Ile; K, Lys; L, Leu; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; Y, Tyr. Understanding humoral responses to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is critical for improving diagnostics, therapeutics, and vaccines. Deep serological profiling of 232 coronavirus disease 2019 (COVID-19) patients and 190 preโ€“COVID-19 era controls using VirScan revealed more than 800 epitopes in the SARS-CoV-2 proteome, including 10 epitopes likely recognized by neutralizing antibodies. Preexisting antibodies in controls recognized SARS-CoV-2 ORF1, whereas only COVID-19 patient antibodies primarily recognized spike protein and nucleoprotein. A machine learning model trained on VirScan data predicted SARS-CoV-2 exposure history with 99% sensitivity and 98% specificity; a rapid Luminex-based diagnostic was developed from the most discriminatory SARS-CoV-2 peptides. Individuals with more severe COVID-19 exhibited stronger and broader SARS-CoV-2 responses, weaker antibody responses to prior infections, and higher incidence of cytomegalovirus and herpes simplex virus 1, possibly influenced by demographic covariates. Among hospitalized patients, males produce stronger SARS-CoV-2 antibody responses than females. [1]: /lookup/doi/10.1126/science.abd4250 [2]: pending:yes


Practical Machine Learning Tutorial: Part.4 (Model Evaluation-2)

#artificialintelligence

In this part, we will elaborate on more model evaluation metrics specifically for multi-class classification problems. Learning curves will be discussed as a tool to come up with an idea of how to trade-off between bias and variance in the model parameter selection. ROC curves for all classes in a specific model will be shown to see how false and true positive rate varies through the modeling process. Finally, we will select the best model and examine its performance on blind well data(data that was not involved in any of the processes up to now). This post is the fourth part(final) of part1, part2, part3.


Concordia University coronavirus 'outbreak' attributed to more than 50 'false positives'

Los Angeles Times

Concordia University in Irvine will discontinue its use of antigen testing for asymptomatic students and employees, after more than 50 false positives prompted unwarranted concern about a possible major coronavirus outbreak. As of Wednesday, university officials said there were six active cases -- four students and two employees -- on campus as opposed to the more than 60 infections reported two days ago. Testing in another six cases has not been confirmed, and 55 students and employees have been confirmed as negative for the virus, they said. Campus officials had canceled athletic practices and urged against out-of-state travel for Thanksgiving because of the erroneous test results, which were preliminary pending confirmation from an outside lab. The university previously had been posting only confirmed test results on its COVID-19 dashboard, but made an exception for the unconfirmed numbers because of the indication of a "potential outbreak."


PSD2 Explainable AI Model for Credit Scoring

arXiv.org Artificial Intelligence

The aim of this paper is to develop and test advanced analytical methods to improve the prediction accuracy of Credit Risk Models, preserving at the same time the model interpretability. In particular, the project focuses on applying an explainable machine learning model to PSD2-related databases. The input data were obtained solely from synthetic account transactions generated from a pool of commercial banks from a pool of Italian commercial banks. Over the total proven models, CatBoost has shown the highest performance. The algorithm implementation produces a GINI of 0.45 after tuning the hyper-parameters combined with their inherent class-weight resampling method. SHAP package is used to provide a global and local interpretation of the model predictions to formulate a human-comprehensive approach to understanding the decision-maker algorithm. The 20 most important features are selected using the Shapley values to present a full human-understandable model that reveals how the attributes of an individual are related to its model prediction.