On the Ideal Number of Groups for Isometric Gradient Propagation

Kim, Bum Jun, Choi, Hyeyeon, Jang, Hyeonah, Kim, Sang Woo

arXiv.org Artificial Intelligence 

These behave similarly in that they apply mean and standard deviation (std) normalization and an affine transform. The difference lies in the units used for computing Recently, various normalization layers have been the mean and std. For example, for n features, layer proposed to stabilize the training of deep neural normalization computes a single mean and std for normalization, networks. Among them, group normalization is a whereas instance normalization computes n means generalization of layer normalization and instance and stds. Meanwhile, group normalization partitions n features normalization by allowing a degree of freedom in into G groups to compute G means and stds. From this the number of groups it uses. However, to determine perspective, layer normalization is a special case of group the optimal number of groups, trial-and-errorbased normalization for G = 1, and instance normalization is a hyperparameter tuning is required, and such special case of group normalization for G = n. Thus, group experiments are time-consuming. In this study, we normalization is more comprehensive and has a degree of discuss a reasonable method for setting the number freedom from the setting of the number of groups.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found