Goto

Collaborating Authors

 Performance Analysis


A New Defense Against Adversarial Images: Turning a Weakness into a Strength

arXiv.org Machine Learning

Natural images are virtually surrounded by low-density misclassified regions that can be efficiently discovered by gradient-guided search --- enabling the generation of adversarial images. While many techniques for detecting these attacks have been proposed, they are easily bypassed when the adversary has full knowledge of the detection mechanism and adapts the attack strategy accordingly. In this paper, we adopt a novel perspective and regard the omnipresence of adversarial perturbations as a strength rather than a weakness. We postulate that if an image has been tampered with, these adversarial directions either become harder to find with gradient methods or have substantially higher density than for natural images. We develop a practical test for this signature characteristic to successfully detect adversarial attacks, achieving unprecedented accuracy under the white-box setting where the adversary is given full knowledge of our detection mechanism.


Migration through Machine Learning Lens -- Predicting Sexual and Reproductive Health Vulnerability of Young Migrants

arXiv.org Machine Learning

In this paper, we have discussed initial findings and results of our experiment to predict sexual and reproductive health vulnerabilities of migrants in a data-constrained environment. Notwithstanding the limited research and data about migrants and migration cities, we propose a solution that simultaneously focuses on data gathering from migrants, augmenting awareness of the migrants to reduce mishaps, and setting up a mechanism to present insights to the key stakeholders in migration to act upon. We have designed a webapp for the stakeholders involved in migration: migrants, who would participate in data gathering process and can also use the app for getting to know safety and awareness tips based on analysis of the data received; public health workers, who would have an access to the database of migrants on the app; policy makers, who would have a greater understanding of the ground reality, and of the patterns of migration through machine-learned analysis. Finally, we have experimented with different machine learning models on an artificially curated dataset. We have shown, through experiments, how machine learning can assist in predicting the migrants at risk and can also help in identifying the critical factors that make migration dangerous for migrants. The results for identifying vulnerable migrants through machine learning algorithms are statistically significant at an alpha of 0.05.


Conditional Learning of Fair Representations

arXiv.org Artificial Intelligence

We propose a novel algorithm for learning fair representations that can simultaneously mitigate two notions of disparity among different demographic subgroups. Two key components underpinning the design of our algorithm are balanced error rate and conditional alignment of representations. In settings that have historically had discrimination, we are interested in defining fairness with respect to a protected group, the group which has historically been disadvantaged. Among many recent attempts to achieve algorithmic fairness (Dwork et al., 2012; Hardt et al., 2016; Zemel et al., 2013; Zafar et al., 2015), learning fair representations has attracted increasing attention However, it has long been empirically observed (Calders et al., 2009) and recently been proved (Zhao Part of this work was done when Han Zhao was visiting the V ector Institute, Toronto. In this work, we provide an affirmative answer to the above question by proposing an algorithm to align the conditional distributions (on the target variable) of representations across different demographic subgroups.


How I scored in the top 1% of Kaggle's Titanic Machine Learning Challenge

#artificialintelligence

You don't need to reinvent the wheel, you need to know how to use the wheel to make your car better. The Titanic challenge hosted by Kaggle is a competition in which the goal is to predict the survival or the death of a given passenger based on a set of variables describing him such as his age, his sex, or his passenger class on the boat. I have been playing with the Titanic dataset for a while. As I'm writing this post, I am ranked 113th out of 11002 participants. You must be wondering how did I manage to achieve this.


How I scored in the top 1% of Kaggle's Titanic Machine Learning Challenge

#artificialintelligence

You don't need to reinvent the wheel, you need to know how to use the wheel to make your car better. The Titanic challenge hosted by Kaggle is a competition in which the goal is to predict the survival or the death of a given passenger based on a set of variables describing him such as his age, his sex, or his passenger class on the boat. I have been playing with the Titanic dataset for a while. As I'm writing this post, I am ranked 113th out of 11002 participants. You must be wondering how did I manage to achieve this.


Using AI, Genes and Game Theory on Antimicrobial Resistance

#artificialintelligence

Antimicrobial resistance (AMR) is the ability of microorganisms like bacteria, viruses, fungi and certain parasites to resist drugs such as antibiotics, antifungals, and antivirals from destroying it. AMR is a worldwide public health threat that is projected to rise. Globally, by 2050, over 10 million deaths per year will be due to antimicrobial resistance according to projections from a report by Wellcome Trust and the UK government. For antibiotic resistance alone, each year over two million people in the U.S. are affected, and 23,000 die, according to figures from the U.S. Centers for Disease Control and Prevention (CDC). Researchers at Washington State University have combined game theory with artificial intelligence (AI) to create a tool that can identify genes that are antibiotic-resistant in bacteria, and published their study in Scientific Reports on October 9, 2019.


Proper Balancing for Cross Validation

#artificialintelligence

Here we plot the precision results of balancing, with under-sampling, only the train set of each CV fold before fitting the model on it and making predictions on the CV fold's test set: Here we plot the precision results of balancing, with over-sampling, only the train set of each CV fold before fitting the model on it and making predictions on the CV fold's test set: It is clear, that balancing so far did not help in getting good test results. However, this is out of scope for this article (:-)) and the goal of this article is achieved: To make the model produce, on each CV fold's test set, evaluation metric scores similar to those that it would produce on an unknown one, for the case that the train data are balanced.


RPA: Strengthen and Simplify Your Cyber Security Operations

#artificialintelligence

Robotic process automation (RPA) uses machine learning (ML) and artificial intelligence (AI) to create a virtual workforce, able to handle repeatable tasks that require a human worker to perform. By using an RPA, companies can perform repetitive tasks faster, longer and with a reduced error rate allowing the workforce to focus on essential duties and responsibilities. In other words, companies have employees working like robots, performing jobs without thinking, why not have robots behaving like people for these tasks. Cybersecurity personnel and cybercriminals are in a constant state of war, automation and specifically RPA can help protect against malicious cyber intruders. Identification and prevention of zero-day attacks (an attack on an exploit the same day of its discovery) and elimination of any system weaknesses is the end goal of internal security teams.


More Powerful Selective Kernel Tests for Feature Selection

arXiv.org Machine Learning

Refining one's hypotheses in the light of data is a commonplace scientific practice, however, this approach introduces selection bias and can lead to specious statistical analysis. One approach of addressing this phenomena is via conditioning on the selection procedure, i.e., how we have used the data to generate our hypotheses, and prevents information to be used again after selection. Many selective inference (a.k.a. post-selection inference) algorithms typically take this approach but will "over-condition" for sake of tractability. While this practice obtains well calibrated $p$-values, it can incur a major loss in power. In our work, we extend two recent proposals for selecting features using the Maximum Mean Discrepancy and Hilbert Schmidt Independence Criterion to condition on the minimal conditioning event. We show how recent advances in multiscale bootstrap makes conditioning on the minimal selection event possible and demonstrate our proposal over a range of synthetic and real world experiments. Our results show that our proposed test is indeed more powerful in most scenarios.


Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training

arXiv.org Machine Learning

Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is known to generalize poorly to stronger attacks, e.g., larger perturbations or other threat models. In this paper, we introduce confidence-calibrated adversarial training (CCAT) where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples. We show that CCAT preserves better the accuracy of normal training while robustness against adversarial examples is achieved via confidence thresholding. Most importantly, in strong contrast to adversarial training, the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training. We also discuss our extensive work to design strong adaptive attacks against CCAT and standard adversarial training which is of independent interest. We present experimental results on MNIST, SVHN and Cifar10.