Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks