Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets