Goto

Collaborating Authors

 Accuracy


Simulation Assisted Likelihood-free Anomaly Detection

arXiv.org Machine Learning

Given the lack of evidence for new particle discoveries at the Large Hadron Collider (LHC), it is critical to broaden the search program. A variety of model-independent searches have been proposed, adding sensitivity to unexpected signals. There are generally two types of such searches: those that rely heavily on simulations and those that are entirely based on (unlabeled) data. This paper introduces a hybrid method that makes the best of both approaches. For potential signals that are resonant in one known feature, this new method first learns a parameterized reweighting function to morph a given simulation to match the data in sidebands. This function is then interpolated into the signal region and then the reweighted background-only simulation can be used for supervised learning as well as for background estimation. The background estimation from the reweighted simulation allows for non-trivial correlations between features used for classification and the resonant feature. A dijet search with jet substructure is used to illustrate the new method. Future applications of Simulation Assisted Likelihood-free Anomaly Detection (SALAD) include a variety of final states and potential combinations with other model-independent approaches.


Anomaly Detection with Density Estimation

arXiv.org Machine Learning

We leverage recent breakthroughs in neural density estimation to propose a new unsupervised anomaly detection technique (ANODE). By estimating the probability density of the data in a signal region and in sidebands, and interpolating the latter into the signal region, a likelihood ratio of data vs. background can be constructed. This likelihood ratio is broadly sensitive to overdensities in the data that could be due to localized anomalies. In addition, a unique potential benefit of the ANODE method is that the background can be directly estimated using the learned densities. Finally, ANODE is robust against systematic differences between signal region and sidebands, giving it broader applicability than other methods. We demonstrate the power of this new approach using the LHC Olympics 2020 R\&D Dataset. We show how ANODE can enhance the significance of a dijet bump hunt by up to a factor of 7 with a 10\% accuracy on the background prediction. While the LHC is used as the recurring example, the methods developed here have a much broader applicability to anomaly detection in physics and beyond.


HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset

arXiv.org Machine Learning

Mosquitoes are the only known vector of malaria, which leads to hundreds of thousands of deaths each year. Understanding the number and location of potential mosquito vectors is of paramount importance to aid the reduction of malaria transmission cases. In recent years, deep learning has become widely used for bioacoustic classification tasks. In order to enable further research applications in this field, we release a new dataset of mosquito audio recordings. With over a thousand contributors, we obtained 195,434 labels of two second duration, of which approximately 10 percent signify mosquito events. We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels. We hope this will become a vital resource for those researching all aspects of malaria, and add to the existing audio datasets for bioacoustic detection and signal processing.


Fairness in Learning-Based Sequential Decision Algorithms: A Survey

arXiv.org Artificial Intelligence

Algorithmic fairness in decision-making has been studied extensively in static settings where one-shot decisions are made on tasks such as classification. However, in practice most decision-making processes are of a sequential nature, where decisions made in the past may have an impact on future data. This is particularly the case when decisions affect the individuals or users generating the data used for future decisions. In this survey, we review existing literature on the fairness of data-driven sequential decision-making. We will focus on two types of sequential decisions: (1) past decisions have no impact on the underlying user population and thus no impact on future data; (2) past decisions have an impact on the underlying user population and therefore the future data, which can then impact future decisions. In each case the impact of various fairness interventions on the underlying population is examined.


Adversarial vs behavioural-based defensive AI with joint, continual and active learning: automated evaluation of robustness to deception, poisoning and concept drift

arXiv.org Artificial Intelligence

Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security consisting in the detection of hostile action based on the unusual nature of events observed on the Information System.In our previous work (presented at C\&ESAR 2018 and FIC 2019), we have associated deep neural networks auto-encoders for anomaly detection and graph-based events correlation to address major limitations in UEBA systems. This resulted in reduced false positive and false negative rates, improved alert explainability, while maintaining real-time performances and scalability. However, we did not address the natural evolution of behaviours through time, also known as concept drift. To maintain effective detection capabilities, an anomaly-based detection system must be continually trained, which opens a door to an adversary that can conduct the so-called "frog-boiling" attack by progressively distilling unnoticed attack traces inside the behavioural models until the complete attack is considered normal. In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise. We also present preliminary work on adversarial AI conducting deception attack, which, in term, will be used to help assess and improve the defense system. These defensive and offensive AI implement joint, continual and active learning, in a step that is necessary in assessing, validating and certifying AI-based defensive solutions.


AI Could Make Up For Lack Of Radiologists In Fight Against Breast Cancer, But It Isn't Ready Yet

#artificialintelligence

Recently a team of researchers from Imperial College London and Google Health created a computer vision model intended to diagnose cases of breast cancer from X-rays. As CNN reports, the model was reportedly trained on X-rays of over 29,000 women, and when pitted against six radiologists the model managed to outperform the assessments of the doctors. Currently, the NHS uses the combined decisions of two doctors in order to diagnose breast cancer from X-rays. If the two doctors end up disagreeing, a third will be brought in to consult on the images. While the doctors had access to the medical records of the patients, the AI device only had the mammograms to base its decisions on.



A Multichannel Deep Neural Network Model Analyzing Multiscale Functional Brain Connectome Data for Attention Deficit Hyperactivity Disorder Detection

#artificialintelligence

To develop a multichannel deep neural network (mcDNN) classification model based on multiscale brain functional connectome data and demonstrate the value of this model by using attention deficit hyperactivity disorder (ADHD) detection as an example. In this retrospective case-control study, existing data from the Neuro Bureau ADHD-200 dataset consisting of 973 participants were used. Multiscale functional brain connectomes based on both anatomic and functional criteria were constructed. The mcDNN model used the multiscale brain connectome data and personal characteristic data (PCD) as joint features to detect ADHD and identify the most predictive brain connectome features for ADHD diagnosis. The mcDNN model was compared with single-channel deep neural network (scDNN) models and the classification performance was evaluated through cross-validation and hold-out validation with the metrics of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC).


Artificial Intelligence Makes Bad Medicine Even Worse

#artificialintelligence

Google researchers made headlines early this month for a study that claimed their artificial intelligence system could outperform human experts at finding breast cancers on mammograms. It sounded like a big win, and yet another example of how AI will soon transform health care: More cancers found! A better, cheaper way to provide high-quality medical care! Hold on to your exclamation points. Machine-enabled health care may bring us many benefits in the years to come, but those will be contingent on the ways in which it's used.


Bayesian Semi-supervised learning under nonparanormality

arXiv.org Machine Learning

Semi-supervised learning is a classification method which makes use of both labeled data and unlabeled data for training. In this paper, we propose a semi-supervised learning algorithm using a Bayesian semi-supervised model. We make a general assumption that the observations will follow two multivariate normal distributions depending on their true labels after the same unknown transformation. We use B-splines to put a prior on the transformation function for each component. To use unlabeled data in a semi-supervised setting, we assume the labels are missing at random. The posterior distributions can then be described using our assumptions, which we compute by the Gibbs sampling technique. The proposed method is then compared with several other available methods through an extensive simulation study. Finally we apply the proposed method in real data contexts for diagnosing breast cancer and classify radar returns. We conclude that the proposed method has better prediction accuracy in a wide variety of cases.