Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI