Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations