Why is it so difficult to retrain neural networks and get the same results? « Pete Warden's blog
Last week I had a question from a colleague about reproducibility in TensorFlow, specifically in the 1.14 era. He wanted to be able to run the same training code multiple times and get exactly the same results, which on the surface doesn't seem like an unreasonable expectation. Machine learning training is fundamentally a series of arithmetic operations applied repeatedly, so what makes getting the same results every time so hard? I had the same question when we first started TensorFlow, and I was lucky enough to learn some of the answers from the numerical programming experts on the team, so I want to share a bit of what I discovered. There are good guides to achieving reproducibility out there, but they don't usually include explanations for why all the steps involved are necessary, or why training becomes so slow when you do apply them.
Nov-27-2022, 19:30:55 GMT
- Technology: