A Statistical Framework for Ranking LLM-Based Chatbots