Clean AutoML for "dirty" data: how and why to automate preprocessing of tables in machine learning
In this post, we would like to discuss such a well-known and extensively described topic as preprocessing tabular data in data science. You may ask, "Why do we need it? There is nothing new to say!" Indeed, what could be more trivial than tabular data processing for machine learning models? But we'll try to collect as much information as possible into one ultimate guide and give it through the perspective of automatic machine learning (AutoML). Disclaimer: all the approaches we describe below are not the only ones. We have used them during the development of our open-source AutoML framework FEDOT. This project has its own specifics in both architecture and development design approaches.
May-2-2022, 17:26:12 GMT
- Technology: