A graphical heuristic for reduction and partitioning of large datasets for scalable supervised training