How Well Do Large Language Models Truly Ground?