Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation