A finite time analysis of distributed Q-learning