Goto

Collaborating Authors

 Country


A Reparameterization-Invariant Flatness Measure for Deep Neural Networks

arXiv.org Machine Learning

The performance of deep neural networks is often attributed to their automated, task-related feature construction. It remains an open question, though, why this leads to solutions with good generalization, even in cases where the number of parameters is larger than the number of samples. Back in the 90s, Hochreiter and Schmidhuber observed that flatness of the loss surface around a local minimum correlates with low generalization error. For several flatness measures, this correlation has been empirically validated. However, it has recently been shown that existing measures of flatness cannot theoretically be related to generalization due to a lack of invariance with respect to reparameterizations. We propose a natural modification of existing flatness measures that results in invariance to reparameterization.


Square Attack: a query-efficient black-box adversarial attack via random search

arXiv.org Machine Learning

We propose the Square Attack, a new score-based black-box $l_2$ and $l_\infty$ adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. The Square Attack is based on a randomized search scheme where we select localized square-shaped updates at random positions so that the $l_\infty$- or $l_2$-norm of the perturbation is approximately equal to the maximal budget at each step. Our method is algorithmically transparent, robust to the choice of hyperparameters, and is significantly more query efficient compared to the more complex state-of-the-art methods. In particular, on ImageNet we improve the average query efficiency for various deep networks by a factor of at least $2$ and up to $7$ compared to the recent state-of-the-art $l_\infty$-attack of Meunier et al. while having a higher success rate. The Square Attack can even be competitive to gradient-based white-box attacks in terms of success rate. Moreover, we show its utility by breaking a recently proposed defense based on randomization. The code of our attack is available at https://github.com/max-andr/square-attack


Barcodes as summary of objective function's topology

arXiv.org Machine Learning

We apply the canonical forms (barcodes) of gradient Morse complexes to explore topology of loss surfaces. We present a new algorithm for calculations of the objective function's barcodes of minima. Our experiments confirm two principal observations: 1) the barcodes of minima are located in a small lower part of the range of values of loss function of neural networks, 2) an increase of the neural network's depth brings down the minima's barcodes. This has natural implications for the neural network's learning and generalization ability.


Learning Likelihoods with Conditional Normalizing Flows

arXiv.org Machine Learning

Such behavior is desirable in multivariate structured prediction tasks, where handcrafted per-pixel loss-based methods inadequately capture strong correlations between output dimensions. CNFs are efficient in sampling and inference, they can be trained with a likelihood-based objective, and CNFs, being generative flows, do not suffer from mode collapse or training instabilities. We provide an effective method to train continuous CNFs for binary problems and in particular, we apply these CNFs to super-resolution and vessel segmentation tasks demonstrating competitive performance on standard benchmark datasets in terms of likelihood and conventional metrics. When the output y is high-dimensional this is a particularly challenging task, and the practitioner is left with many design choices. Do we factorize the conditional? If not, do we model correlations with, say, a conditional random field (Prince, 2012)? Do we use a unimodal distribution? How fat should the tails be? Do we use an explicit likelihood at all, or use implicit methods (Mohamed & Rezende, 2015) such as a GAN (Goodfellow et al., 2014)? Do we quantize the output?


On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

arXiv.org Machine Learning

The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the \emph{classical} central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the \emph{generalized} CLT, which suggests that the GN converges to a \emph{heavy-tailed} $\alpha$-stable random vector, where \emph{tail-index} $\alpha$ determines the heavy-tailedness of the distribution. Accordingly, we propose to analyze SGD as a discretization of an SDE driven by a L\'{e}vy motion. Such SDEs can incur `jumps', which force the SDE and its discretization \emph{transition} from narrow minima to wider minima, as proven by existing metastability theory and the extensions that we proved recently. In this study, under the $\alpha$-stable GN assumption, we further establish an explicit connection between the convergence rate of SGD to a local minimum and the tail-index $\alpha$. To validate the $\alpha$-stable assumption, we conduct experiments on common deep learning scenarios and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima.


Belief and plausibility measures for D numbers

arXiv.org Artificial Intelligence

As a generalization of Dempster-Shafer theory, D number theory provides a framework to deal with uncertain information with non-exclusiven ess and incompleteness. However, some basic concepts in D number theory are not well defined. In this note, the belief and plausibility measures for D nu m-bers have been proposed, and basic properties of these measure s have been revealed as well. Keywords: Belief measure, Plausibility measure, D numbers, Dempster-Shafer theory 1. Introduction Dempster-Shafer evidence theory (DST) [1, 2] is one of the most p opular theories for dealing with uncertain information, and has been widely u sed in various fields [3-5]. But it is limited by some hypotheses and constrain ts that are hardly satisfied in some situation [6-9]. There are two main as pects.


Learning Perceptual Inference by Contrasting

arXiv.org Artificial Intelligence

"Thinking in pictures," [1] i.e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a significant ability to perform logical induction and a crucial factor in the intellectual history of technology development. Modern Artificial Intelligence (AI), fueled by massive datasets, deeper models, and mighty computation, has come to a stage where (super-)human-level performances are observed in certain specific tasks. However, current AI's ability in "thinking in pictures" is still far lacking behind. In this work, we study how to improve machines' reasoning ability on one challenging task of this kind: Raven's Progressive Matrices (RPM). Specifically, we borrow the very idea of "contrast effects" from the field of psychology, cognition, and education to design and train a permutation-invariant model. Inspired by cognitive studies, we equip our model with a simple inference module that is jointly trained with the perception backbone. Combining all the elements, we propose the Contrastive Perceptual Inference network (CoPINet) and empirically demonstrate that CoPINet sets the new state-of-the-art for permutation-invariant models on two major datasets. We conclude that spatial-temporal reasoning depends on envisaging the possibilities consistent with the relations between objects and can be solved from pixel-level inputs.


Quadratic Q-network for Learning Continuous Control for Autonomous Vehicles

arXiv.org Artificial Intelligence

Reinforcement Learning algorithms have recently been proposed to learn time-sequential control policies in the field of autonomous driving. Direct applications of Reinforcement Learning algorithms with discrete action space will yield unsatisfactory results at the operational level of driving where continuous control actions are actually required. In addition, the design of neural networks often fails to incorporate the domain knowledge of the targeting problem such as the classical control theories in our case. In this paper, we propose a hybrid model by combining Q-learning and classic PID (Proportion Integration Differentiation) controller for handling continuous vehicle control problems under dynamic driving environment. Particularly, instead of using a big neural network as Q-function approximation, we design a Quadratic Q-function over actions with multiple simple neural networks for finding optimal values within a continuous space. We also build an action network based on the domain knowledge of the control mechanism of a PID controller to guide the agent to explore optimal actions more efficiently.We test our proposed approach in simulation under two common but challenging driving situations, the lane change scenario and ramp merge scenario. Results show that the autonomous vehicle agent can successfully learn a smooth and efficient driving behavior in both situations.


Heuristic Strategies in Uncertain Approval Voting Environments

arXiv.org Artificial Intelligence

In many collective decision making situations, agents vote to choose an alternative that best represents the preferences of the group. Agents may manipulate the vote to achieve a better outcome by voting in a way that does not reflect their true preferences. In real world voting scenarios, people often do not have complete information about other voter preferences and it can be computationally complex to identify a strategy that will maximize their expected utility. In such situations, it is often assumed that voters will vote truthfully rather than expending the effort to strategize. However, being truthful is just one possible heuristic that may be used. In this paper, we examine the effectiveness of heuristics in single winner and multi-winner approval voting scenarios with missing votes. In particular, we look at heuristics where a voter ignores information about other voting profiles and makes their decisions based solely on how much they like each candidate. In a behavioral experiment, we show that people vote truthfully in some situations and prioritize high utility candidates in others. We examine when these behaviors maximize expected utility and show how the structure of the voting environment affects both how well each heuristic performs and how humans employ these heuristics.


DeepAlign: Alignment-based Process Anomaly Correction using Recurrent Neural Networks

arXiv.org Artificial Intelligence

In this paper, we propose DeepAlign, a novel approach to multi-perspective process anomaly correction, based on recurrent neural networks and bidirectional beam search. At the core of the DeepAlign algorithm are two recurrent neural networks trained to predict the next event. One is reading sequences of process executions from left to right, while the other is reading the sequences from right to left. By combining the predictive capabilities of both neural networks, we show that it is possible to calculate sequence alignments, which are used to detect and correct anomalies. DeepAlign utilizes the case-level and event-level attributes to closely model the decisions within a process. We evaluate the performance of our approach on an elaborate data corpus of 30 realistic synthetic event logs and compare it to three state-of-the-art conformance checking methods. DeepAlign produces better corrections than the rest of the field reaching an overall accuracy of 98.45% across all datasets, whereas the best comparable state-of-the-art method reaches 70.19%.