Measuring the Inconsistency of Large Language Models in Preferential Ranking