On the Ideal Number of Groups for Isometric Gradient Propagation

Kim, Bum Jun, Choi, Hyeyeon, Jang, Hyeonah, Kim, Sang Woo

Feb-6-2023–arXiv.org Artificial Intelligence

These behave similarly in that they apply mean and standard deviation (std) normalization and an affine transform. The difference lies in the units used for computing Recently, various normalization layers have been the mean and std. For example, for n features, layer proposed to stabilize the training of deep neural normalization computes a single mean and std for normalization, networks. Among them, group normalization is a whereas instance normalization computes n means generalization of layer normalization and instance and stds. Meanwhile, group normalization partitions n features normalization by allowing a degree of freedom in into G groups to compute G means and stds. From this the number of groups it uses. However, to determine perspective, layer normalization is a special case of group the optimal number of groups, trial-and-errorbased normalization for G = 1, and instance normalization is a hyperparameter tuning is required, and such special case of group normalization for G = n. Thus, group experiments are time-consuming. In this study, we normalization is more comprehensive and has a degree of discuss a reasonable method for setting the number freedom from the setting of the number of groups.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

Feb-6-2023

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found