Generating Compositional Scenes via Text-to-image RGBA Instance Generation