Goto

Collaborating Authors

 max pool


Computable Lipschitz Bounds for Deep Neural Networks

Pintore, Moreno, Després, Bruno

arXiv.org Machine Learning

Deriving sharp and computable upper bounds of the Lipschitz constant of deep neural networks is crucial to formally guarantee the robustness of neural-network based models. We analyse three existing upper bounds written for the $l^2$ norm. We highlight the importance of working with the $l^1$ and $l^\infty$ norms and we propose two novel bounds for both feed-forward fully-connected neural networks and convolutional neural networks. We treat the technical difficulties related to convolutional neural networks with two different methods, called explicit and implicit. Several numerical tests empirically confirm the theoretical results, help to quantify the relationship between the presented bounds and establish the better accuracy of the new bounds. Four numerical tests are studied: two where the output is derived from an analytical closed form are proposed; another one with random matrices; and the last one for convolutional neural networks trained on the MNIST dataset. We observe that one of our bound is optimal in the sense that it is exact for the first test with the simplest analytical form and it is better than other bounds for the other tests.


The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

Grishina, Anastasiia, Hort, Max, Moonen, Leon

arXiv.org Artificial Intelligence

The use of modern Natural Language Processing (NLP) techniques has shown to be beneficial for software engineering tasks, such as vulnerability detection and type inference. However, training deep NLP models requires significant computational resources. This paper explores techniques that aim at achieving the best usage of resources and available information in these models. We propose a generic approach, EarlyBIRD, to build composite representations of code from the early layers of a pre-trained transformer model. We empirically investigate the viability of this approach on the CodeBERT model by comparing the performance of 12 strategies for creating composite representations with the standard practice of only using the last encoder layer. Our evaluation on four datasets shows that several early layer combinations yield better performance on defect detection, and some combinations improve multi-class classification. More specifically, we obtain a +2 average improvement of detection accuracy on Devign with only 3 out of 12 layers of CodeBERT and a 3.3x speed-up of fine-tuning. These findings show that early layers can be used to obtain better results using the same resources, as well as to reduce resource usage during fine-tuning and inference.


Detecting unusual input to neural networks

Martin, Jörg, Elster, Clemens

arXiv.org Machine Learning

Evaluating a neural network on an input that differs markedly from the training data might cause erratic and flawed predictions. We study a method that judges the unusualness of an input by evaluating its informative content compared to the learned parameters. This technique can be used to judge whether a network is suitable for processing a certain input and to raise a red flag that unexpected behavior might lie ahead. We compare our approach to various methods for uncertainty evaluation from the literature for various datasets and scenarios. Specifically, we introduce a simple, effective method that allows to directly compare the output of such metrics for single input points even if these metrics live on different scales.


Learning Effective Representations from Clinical Notes

Dubois, Sebastien, Romano, Nathanael, Kale, David C., Shah, Nigam, Jung, Kenneth

arXiv.org Machine Learning

Clinical notes are a rich source of information about patient state. However, using them effectively presents many challenges. In this work we present two methods for summarizing clinical notes into patient-level representations. The resulting representations are evaluated on a range of prediction tasks and cohort sizes. The new representations offer significant predictive performance gains over the common baselines of Bag of Words and topic model representations across all tested tasks and cohort sizes.