Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs