Test-Driven Machine Learning


First, before I start, I want to say something about what that is, or what I understand from this. So, here is one interpretation. It is about using data, obviously. So, it has relationships to analytics and data science, and it is, obviously, part of AI in some way. This is my little taxonomy, how I see things linking together. You have computer science, and that has subfields like AI, software engineering, and machine learning is typically considered to be subfield of AI, but a lot of principles of software engineering apply in this area. This is what I want to talk about today. It's heavily used in data science. So, the difference between AI and data science is somewhat fluid if you like, but data science tries to understand what's in data and tries to understand questions about data. But then it tries to use this to make decisions, and then we are back at AI, artificial intelligence, where it's mostly about automating decision making. We have a couple of definitions. AI means using intelligence, making machines intelligent, and that means you can somehow function appropriate in an environment with foresight. Machine learning is a field that looks for algorithms that can automatically improve their performance without explicit programming, but by observing relevant data. And yes, I've thrown in data science as well for good measure, the scientific process of turning data into insight for making better decisions. If you have opened any newspaper, you must have seen the discussion around the ethical dimensions of artificial intelligence, machine learning or data science. Testing touches on that as well because there are quite a few problems in that space, and I'm just listing two here. So, you use data, obviously, to do machine learning. Where does this data come from, and are you allowed to use it? Do you violate any privacy laws, or are you building models that you use to make decisions about people? If you do that, then the general data protection regulation in the EU says you have to be able to explain to an individual if you're making a decision based on an algorithm or a machine, if this decision is of any kind of significant impact. That means, in machine learning, a lot of models are already out of the door because you can't do that. You can't explain why a certain decision comes out of a machine learning model if you use particular models.