Benchmarking LLM-based Relevance Judgment Methods