Detecting Leakage In Machine Learning Pipelines Using NANs/complex Numbers


Data leakage in machine learning pipelines can cause havoc for your model. In this post, I'm going to share an amazingly simple way to detect data leakages using NANs and complex numbers while treating your ML pipeline as a black box. I'll talk very briefly about what data leakage is. I'll also talk about leak-detect, a python package I'm releasing to do all this in one line code. Data leakage in an ML model occurs when data used to create predictor variables during training time is unavailable at the time of inference.

