A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores