Stable and low-precision training for large-scale vision-language models Mitchell Wortsman 1 Tim Dettmers 1 Luke Zettlemoyer

Neural Information Processing Systems 

Our main focus is int8 as GPU support for float8 is rare, though we also analyze float8 training through simulation.