Generating compositional scenes via Text-to-image RGBA Instance Generation