Does Continual Learning Meet Compositionality? New Benchmarks and An Evaluation Framework