A method for the systematic generation of graph XAI benchmarks via Weisfeiler-Leman coloring