Unsupervised Learning of Temporal Abstractions with Slot-based Transformers