A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations