Limitations of the Empirical Fisher Approximation
Kunstner, Frederik, Balles, Lukas, Hennig, Philipp
Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.
May-29-2019
- Country:
- Asia > Middle East
- Israel > Haifa District > Haifa (0.04)
- Europe
- France > Hauts-de-France
- Germany > Baden-Württemberg
- Tübingen Region > Tübingen (0.04)
- Spain > Andalusia
- Granada Province > Granada (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- North America
- Canada
- Alberta > Census Division No. 15
- Improvement District No. 9 > Banff (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.04)
- Alberta > Census Division No. 15
- United States
- California > San Diego County
- San Diego (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Indiana > Hamilton County
- Fishers (0.04)
- California > San Diego County
- Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine (0.48)
- Technology: