Robust Wake-Up Word Detection by Two-stage Multi-resolution Ensembles