Escaping mediocrity: how two-layer networks learn hard single-index models with SGD