industrialise
From notebook hell to container heaven
Here's a bitter pill to swallow in modern Machine Learning. Companies are investing heavily in ML. But most real-life ML applications, while looking shiny (no pun intended) in the frontend, are often nothing more than a bunch of notebooks duct-taped together, which are hard to maintain and hard to add new features to. This slows development teams down and stops innovation. There has been much debate online about notebooks and their pros and cons.
Industrialising Data Science
The application of pattern recognition technology to large datasets has revolutionised the digital economy. But digital represents only 5% of GDP in OECD countries: the remaining 95% is still largely untouched by data science (DS). The larger "old economy" companies are just beginning their data journey and data science is yet to be institutionalised: Outside the tech leviathans DS is still a cottage industry with artisan DS crafting bespoke prototypes to their own standards. If DS is to fulfil its promise, it needs to industrialise. This blog explains what I mean by this, and proposes a number of issues which must be addressed if it is to do so. Most DS blogs are technical: algorithms, distributed computation, visualisation etc. The rest are case studies of projects where these techniques are applied to a domain.
Industrialising Data Science
The application of pattern recognition technology to large datasets has revolutionised the digital economy. But digital represents only 5% of GDP in OECD countries: the remaining 95% is still largely untouched by data science (DS). The larger "old economy" companies are just beginning their data journey and data science is yet to be institutionalised: Outside the tech leviathans DS is still a cottage industry with artisan DS crafting bespoke prototypes to their own standards. If DS is to fulfil its promise, it needs to industrialise. This blog explains what I mean by this, and proposes a number of issues which must be addressed if it is to do so. Most DS blogs are technical: algorithms, distributed computation, visualisation etc. The rest are case studies of projects where these techniques are applied to a domain.