PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning Mark Ibrahim

Neural Information Processing Systems 

Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite such promise, the use of synthetic image data is still limited - and often played down - mainly due to their lack of realism.