Measuring and Narrowing the Compositionality Gap in Language Models