Assessing Out-of-Domain Language Model Performance from Few Examples