When it comes to data products, a lot of the time there is a misconception that these cannot be put through automated testing. Although some parts of the pipeline can not go through traditional testing methodologies due to their experimental and stochastic nature, most of the pipeline can. In addition to this, the more unpredictable algorithms can be put through specialised validation processes. Let's take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines. This pyramid is a representation of the types of tests that you would write for an application.
Microservices architecture describes the practice of breaking up an application into a series of smaller and more problem-solution oriented components. Long story short - YES. Software testing is important for a number of reasons, but most importantly: No one likes an application that has bugs and stops working for no reason. And there is no need to talk about the hazards of poor security, which allows hackers to steal credentials and even money. As long as you develop an application that will be used by users and has some complexity, tests should not be an option – they should be mandatory. There are various types of software testing.
Testing is arguably the most important aspect of software development. Whether manual or automated, testing ensures the software works as expected. Broken software causes production outages, unsatisfied customers, refunds, decreased trust, or even complete financial collapse. Testing minimizes these types of negative consequences and when done well, enables teams to reach increasingly higher quality thresholds. DevOps transforms testing by promoting it to a critical concern across all phases of the SDLC and by shifting the responsibilities onto all engineers.
In the past few years there has been a large increase in tools trying to solve the challenge of bringing machine learning models to production. One thing that these tools seem to have in common is the incorporation of notebooks into production pipelines. This article aims to explain why this drive towards the use of notebooks in production is an anti pattern, giving some suggestions along the way. Let's start by defining what these are, for those readers who haven't been exposed to notebooks, or call them by a different name. Notebooks are web interfaces that allow a user to create documents containing code, visualisations and text.
Artificial Intelligence and Machine Learning, fondly known as AI & ML respectively, are the hottest buzzwords in the Software Industry today. The Testing community, Service-organisations, and Testing Product / Tools companies have also leaped on this bandwagon. While some interesting work is happening in the Software Testing space, there does seem to be a lot of hype as well. It is unfortunately not very easy to figure out the core interesting work / research / solutions from the fluff around. See my blog post - "ODSC - Data Science, AI, ML - Hype, or Reality?" as a reference.