FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information