Is 'fake data' the real deal when training algorithms?