arcsinh
APPENDIX Overview
A trivial example of an equivalence relation is equality ( =). More useful examples in the context of ICA are equivalence up to permutation, rescaling, or scalar transformation. Defining an appropriate equivalence class for the problem at hand therefore allows us to specify exactly the type of indeterminancies which cannot be resolved and up to which the true generative process can be recovered.
APPENDIX Overview
A trivial example of an equivalence relation is equality ( =). More useful examples in the context of ICA are equivalence up to permutation, rescaling, or scalar transformation. Defining an appropriate equivalence class for the problem at hand therefore allows us to specify exactly the type of indeterminancies which cannot be resolved and up to which the true generative process can be recovered.
The Numerical Stability of Hyperbolic Representation Learning
Mishne, Gal, Wan, Zhengchao, Wang, Yusu, Yang, Sheng
Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincar\'e ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincar\'e ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincar\'e ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
The Skellam Mechanism for Differentially Private Federated Learning
Agarwal, Naman, Kairouz, Peter, Liu, Ziyu
We introduce the multi-dimensional Skellam mechanism, a discrete differential privacy mechanism based on the difference of two independent Poisson random variables. To quantify its privacy guarantees, we analyze the privacy loss distribution via a numerical evaluation and provide a sharp bound on the R\'enyi divergence between two shifted Skellam distributions. While useful in both centralized and distributed privacy applications, we investigate how it can be applied in the context of federated learning with secure aggregation under communication constraints. Our theoretical findings and extensive experimental evaluations demonstrate that the Skellam mechanism provides the same privacy-accuracy trade-offs as the continuous Gaussian mechanism, even when the precision is low. More importantly, Skellam is closed under summation and sampling from it only requires sampling from a Poisson distribution -- an efficient routine that ships with all machine learning and data analysis software packages. These features, along with its discrete nature and competitive privacy-accuracy trade-offs, make it an attractive alternative to the newly introduced discrete Gaussian mechanism.
- North America > United States > Virginia (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Discovering Parametric Activation Functions
Bingham, Garrett, Miikkulainen, Risto
Recent studies have shown that the choice of activation function can significantly affect the performance of deep learning networks. However, the benefits of novel activation functions have been inconsistent and task dependent, and therefore the rectified linear unit (ReLU) is still the most commonly used. This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Evolutionary search is used to discover the general form of the function, and gradient descent to optimize its parameters for different parts of the network and over the learning process. Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective. It discovers both general activation functions and specialized functions for different architectures, consistently improving accuracy over ReLU and other recently proposed activation functions by significant margins. The approach can therefore be used as an automated optimization step in applying deep learning to new tasks. The rectified linear unit (ReLU(x) max{x, 0}) is the most commonly used activation function in modern deep learning architectures (Nair & Hinton, 2010). When introduced, it offered substantial improvements over the previously popular tanh and sigmoid activation functions. Because ReLU is unbounded as x, it is less susceptible to vanishing gradients than tanh and sigmoid are. It is also simple to calculate, which leads to faster training times. Activation function design continues to be an active area of research, and a number of novel activation functions have been introduced since ReLU, each with different properties (Nwankpa et al., 2018).
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
Exponentiated Gradient Meets Gradient Descent
Ghai, Udaya, Hazan, Elad, Singer, Yoram
The (stochastic) gradient descent and the multiplicative update method are probably the most popular algorithms in machine learning. We introduce and study a new regularization which provides a unification of the additive and multiplicative updates. This regularization is derived from an hyperbolic analogue of the entropy function, which we call hypentropy. It is motivated by a natural extension of the multiplicative update to negative numbers. The hypentropy has a natural spectral counterpart which we use to derive a family of matrix-based updates that bridge gradient methods and the multiplicative method for matrices. While the latter is only applicable to positive semi-definite matrices, the spectral hypentropy method can naturally be used with general rectangular matrices. We analyze the new family of updates by deriving tight regret bounds. We study empirically the applicability of the new update for settings such as multiclass learning, in which the parameters constitute a general rectangular matrix.