Learning Visual Composition through Improved Semantic Guidance