On the Reasoning Capacity of AI Models and How to Quantify It