relu
Revisit Fuzzy Neural Network: Demystifying Batch Normalization and ReLU with Generalized Hamming Network
We revisit fuzzy neural network with a cornerstone notion of generalized hamming distance, which provides a novel and theoretically justified framework to re-interpret many useful neural network techniques in terms of fuzzy logic. In particular, we conjecture and empirically illustrate that, the celebrated batch normalization (BN) technique actually adapts the "normalized" bias such that it approximates the rightful bias induced by the generalized hamming distance. Once the due bias is enforced analytically, neither the optimization of bias terms nor the sophisticated batch normalization is needed. Also in the light of generalized hamming distance, the popular rectified linear units (ReLU) can be treated as setting a minimal hamming distance threshold between network inputs and weights. This thresholding scheme, on the one hand, can be improved by introducing double-thresholding on both positive and negative extremes of neuron outputs. On the other hand, ReLUs turn out to be non-essential and can be removed from networks trained for simple tasks like MNIST classification. The proposed generalized hamming network (GHN) as such not only lends itself to rigorous analysis and interpretation within the fuzzy logic theory but also demonstrates fast learning speed, well-controlled behaviour and state-of-the-art performances on a variety of learning tasks.
Invertibility of Convolutional Generative Networks from Partial Measurements
In this work, we present new theoretical results on convolutional generative neural networks, in particular their invertibility (i.e., the recovery of input latent code given the network output). The study of network inversion problem is motivated by image inpainting and the mode collapse problem in training GAN. Network inversion is highly non-convex, and thus is typically computationally intractable and without optimality guarantees. However, we rigorously prove that, under some mild technical assumptions, the input of a two-layer convolutional generative network can be deduced from the network output efficiently using simple gradient descent. This new theoretical finding implies that the mapping from the low-dimensional latent space to the high-dimensional image space is bijective (i.e., one-to-one). In addition, the same conclusion holds even when the network output is only partially observed (i.e., with missing pixels). Our theorems hold for 2-layer convolutional generative network with ReLU as the activation function, but we demonstrate empirically that the same conclusion extends to multi-layer networks and networks with other activation functions, including the leaky ReLU, sigmoid and tanh.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Italy (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Europe > Austria > Vienna (0.14)
- Asia > Sri Lanka > Central Province > Kandy District > Kandy (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
- Law (1.00)
- Government (1.00)
- Information Technology (0.67)
- Transportation > Ground > Road (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- (5 more...)
- North America > United States > California > Yolo County > Davis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (8 more...)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)
A Proof of Proposition 2.5
Proposition 2.5 is a direct consequence of the following lemma (remember that Lemma A.1 (Smooth functions conserved through a given flow.) . Assume that @h () ()=0 for all 2 . Let us first show the direct inclusion. Now let us show the converse inclusion. We recall (cf Example 2.10 and Example 2.11) that linear and Assumption 2.9, which we recall reads as: Theorem 2.14, let us show that (9) holds for standard ML losses.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)