Is Your Video Language Model a Reliable Judge?