What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering