AITopics | Andriushchenko, Maksym

Collaborating Authors

Andriushchenko, Maksym

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Mosbach, Marius, Andriushchenko, Maksym, Klakow, Dietrich

arXiv.org Machine LearningOct-6-2020

Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks. Despite the strong empirical performance of fine-tuned models, fine-tuning is an unstable process: training the same model with multiple random seeds can result in a large variance of the task performance. Previous literature (Devlin et al., 2019; Lee et al., 2020; Dodge et al., 2020) identified two potential reasons for the observed instability: catastrophic forgetting and small size of the fine-tuning datasets. In this paper, we show that both hypotheses fail to explain the fine-tuning instability. We analyze BERT, RoBERTa, and ALBERT, finetuned on three commonly used datasets from the GLUE benchmark, and show that the observed instability is caused by optimization difficulties that lead to vanishing gradients. Additionally, we show that the remaining variance of the downstream task performance can be attributed to differences in generalization where fine-tuned models with the same training loss exhibit noticeably different test performance. Based on our analysis, we present a simple but strong baseline that makes fine-tuning BERTbased models significantly more stable than the previously proposed approaches. Pre-trained transformer-based masked language models such as BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019), and ALBERT (Lan et al., 2020) have had a dramatic impact on the NLP landscape in the recent year. The standard recipe for using such models typically involves training a pretrained model for a few epochs on a supervised downstream dataset, which is known as fine-tuning.

artificial intelligence, natural language, neural network, (17 more...)

arXiv.org Machine Learning

2006.04884

Country:

Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks

Croce, Francesco, Andriushchenko, Maksym, Singh, Naman D., Flammarion, Nicolas, Hein, Matthias

arXiv.org Machine LearningJun-23-2020

A large body of research has focused on adversarial attacks which require to modify all input features with small $l_2$- or $l_\infty$-norms. In this paper we instead focus on query-efficient sparse attacks in the black-box setting. Our versatile framework, Sparse-RS, based on random search achieves state-of-the-art success rate and query efficiency for different sparse attack models such as $l_0$-bounded perturbations (outperforming established white-box methods), adversarial patches, and adversarial framing. We show the effectiveness of Sparse-RS on different datasets considering problems from image recognition and malware detection and multiple variations of sparse threat models, including targeted and universal perturbations. In particular Sparse-RS can be used for realistic attacks such as universal adversarial patch attacks without requiring a substitute model. The code of our framework is available at https://github.com/fra31/sparse-rs.

air transportation, neural network, query, (20 more...)

arXiv.org Machine Learning

2006.12834

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Add feedback

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks

Andriushchenko, Maksym, Hein, Matthias

arXiv.org Machine LearningJun-8-2019

The problem of adversarial samples has been studied extensively for neural networks. However, for boosting, in particular boosted decision trees and decision stumps there are almost no results, even though boosted decision trees, as e.g. XGBoost, are quite popular due to their interpretability and good prediction performance. We show in this paper that for boosted decision stumps the exact min-max optimal robust loss and test error for an $l_\infty$-attack can be computed in $O(n\,T\log T)$, where $T$ is the number of decision stumps and $n$ the number of data points, as well as an optimal update of the ensemble in $O(n^2\,T\log T)$. While not exact, we show how to optimize an upper bound on the robust loss for boosted trees. Up to our knowledge, these are the first algorithms directly optimizing provable robustness guarantees in the area of boosting. We make the code of all our experiments publicly available at https://github.com/max-andr/provably-robust-boosting

decision tree learning, deep learning, robustness, (18 more...)

arXiv.org Machine Learning

1906.03526

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.50)
Health & Medicine > Therapeutic Area (0.47)
Government > Military (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem

Hein, Matthias, Andriushchenko, Maksym, Bitterwolf, Julian

arXiv.org Machine LearningDec-13-2018

Classifiers used in the wild, in particular for safety-critical systems, should not only have good generalization properties but also should know when they don't know, in particular make low confidence predictions far away from the training data. We show that ReLU type neural networks which yield a piecewise linear classifier function fail in this regard as they produce almost always high confidence predictions far away from the training data. For bounded domains like images we propose a new robust optimization technique similar to adversarial training which enforces low confidence predictions far away from the training data. We show that this technique is surprisingly effective in reducing the confidence of predictions far away from the training data while maintaining high confidence predictions and similar test error on the original classification task compared to standard training.

deep learning, neural network, training data, (20 more...)

arXiv.org Machine Learning

1812.0572

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Logit Pairing Methods Can Fool Gradient-Based Attacks

Mosbach, Marius, Andriushchenko, Maksym, Trost, Thomas, Hein, Matthias, Klakow, Dietrich

arXiv.org Machine LearningOct-29-2018

Recently, several logit regularization methods have been proposed in [Kannan et al., 2018] to improve the adversarial robustness of classifiers. We show that the proposed computationally fast methods - Clean Logit Pairing (CLP) and Logit Squeezing (LSQ) - just make the gradient-based optimization problem of crafting adversarial examples harder, without providing actual robustness. For Adversarial Logit Pairing (ALP) we find that it can give indeed robustness against adversarial examples and we study it in different settings. Especially, we show that ALP may provide additional robustness when combined with adversarial training. However, the increase is much smaller than claimed by [Kannan et al., 2018]. Finally, our results suggest that evaluation against an iterative PGD attack relies heavily on the parameters used and may result in false conclusions regarding the robustness.

adversarial accuracy, artificial intelligence, neural network, (14 more...)

arXiv.org Machine Learning

1810.12042

Country:

Europe > Germany (0.14)
Europe > Sweden (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Provable Robustness of ReLU networks via Maximization of Linear Regions

Croce, Francesco, Andriushchenko, Maksym, Hein, Matthias

arXiv.org Machine LearningOct-17-2018

It has been shown that neural network classifiers are not robust. This raises concerns about their usage in safety-critical systems. We propose in this paper a regularization scheme for ReLU networks which provably improves the robustness of the classifier by maximizing the linear region of the classifier as well as the distance to the decision boundary. Our techniques allow even to find the minimal adversarial perturbation for a fraction of test points for large networks. In the experiments we show that our approach improves upon adversarial training both in terms of lower and upper bounds on the robustness and is comparable or better than the state of the art in terms of test error and robustness.

artificial intelligence, neural network, robustness, (19 more...)

arXiv.org Machine Learning

1810.07481

Country: Europe > Germany (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

Hein, Matthias, Andriushchenko, Maksym

Neural Information Processing SystemsDec-31-2017

Recent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific \emph{lower bounds} on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp. neural networks improves the robustness of the classifier without any loss in prediction performance.

artificial intelligence, neural network, robustness, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > Germany > Saarland (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

Hein, Matthias, Andriushchenko, Maksym

arXiv.org Artificial IntelligenceNov-5-2017

Recent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific lower bounds on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp. neural networks improves the robustness of the classifier without any loss in prediction performance.

artificial intelligence, neural network, robustness, (17 more...)

arXiv.org Artificial Intelligence

1705.08475

Country:

North America > United States (0.28)
Europe > Germany > Saarland (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback