Reviews: Limitations of the empirical Fisher approximation for natural gradient descent
–Neural Information Processing Systems
Originality: the paper lacks a sound and novel contribution. Theoretically, there is only one minor result as stated above. Technically, there is not a systematical experimental study on real deep networks. The main contribution is on discussing two different formulations of the Fisher matrix. The main trick on making these two formulations different (despite that the authors took a sophisticated approach going though GGN) is that the so called empirical Fisher relies on y_n (target of neural network output), and if one consider y_n to be randomly distributed with fixed variance based on the neural network output, the two formulations are equivalent, otherwise there is a scale parameter in eq.(3) which is shrinking making the two formulations different because of the shrinking and damping.
Neural Information Processing Systems
Jan-23-2025, 09:11:07 GMT
- Technology: