On the Variance of the Fisher Information for Deep Learning

Soen, Alexander, Sun, Ke

arXiv.org Machine Learning 

The Fisher information is one of the most fundamental concepts in statistical machine learning. Intuitively, it measures the amount of information carried by a single random observation when the underlying model varies along certain directions in the parameter space: if such a variation does not change the underlying model, then a corresponding observation contains zero (Fisher) information and is non-informative regarding the varied parameter. Parameter estimation is impossible in this case. Otherwise, if the variation significantly changes the model and has large information, then an observation is informative and the parameter estimation can be more efficient as compared to parameters with small Fisher information. In machine learning, this basic concept is useful for defining intrinsic structures of the parameter space, measuring model complexity, and performing gradient-based optimization.