If a business isn't using AI, then it's either claiming to use it or claiming that it's about to start any day now. Whatever problem your company is having, it seems that a solution powered by decision intelligence, machine learning or some other form of AI is available. Yet, beneath the marketing hype, the truth is that many businesses can indeed benefit from this tech – if they take the time to learn what it can (and can't) do for them and understand the potential pitfalls. In essence, AI enables its users to do useful things with a large pool of data – for instance, fish out insights without tying up the time of data scientists. Data is therefore fundamental to AI.
Today, Informatica is announcing the launch of Cloud Data Governance and Catalog (CDGC), which unites many prior features of the Informatica platform with new capabilities, all packaged into a cloud-native solution. My ZDNet Big on Data colleague Tony Baer has a great write-up of CDGC offering overall. Also read: Informatica's latest acquisition extends its data catalog to hard-to-get sources Among the new features in CDGC is an AI Model Governance (AIMG) capability, which seeks to map DevOps principles to the machine learning (ML)/AI lifecycle. We cover AIMG in this post. AIMG is intended to provide traceability of an AI asset throughout its lifecycle, from inception through production, iteration and training, deployment and "KPI delivery" – which is Informatica's characterization of its model monitoring capabilities.
ML software development is complex; building an ML model is one thing, improving and maintaining it, is another. If you want your machine learning models to be robust, compliant, and give reproducible results, you must invest time and money in quality model management. Model governance, model provenance, and model lineage tools help you in doing just that by tracking model activity, recording all changes in the data and the model, and outlining best practices for data management and disposal. In this post, let us discuss what these tools are and how to choose the best ones. While these three practices are meant for different things, they have a lot in common. So a tool that is good for, say model governance, is usually great for the other two as well. I will guide you through some of the most popular tools for model governance among developers and explain which one you should choose based on your particular use case.
The pandemic has wreaked havoc on the carefully developed AI models many organizations had in place. With so many different variables shifting at the same time, what we've seen in many companies is that their models became unreliable or useless. Having good documentation showing the lifecycle of a model is important, but that still doesn't provide enough information to go on once a model becomes unreliable. What's needed is improved AI model governance, which can help bring greater accountability and traceability for AI/ML models by having practitioners address questions such as: Has any unauthorized person gained access to it? How exactly does AI model governance help tackle these issues?
Data Science (DS) and Machine Learning (ML) are the spines of today's data-driven business decision-making. From a human viewpoint, ML often consists of multiple phases: from gathering requirements and datasets to deploying a model, and to support human decision-making--we refer to these stages together as DS/ML Lifecycle. There are also various personas in the DS/ML team and these personas must coordinate across the lifecycle: stakeholders set requirements, data scientists define a plan, and data engineers and ML engineers support with data cleaning and model building. Later, stakeholders verify the model, and domain experts use model inferences in decision making, and so on. Throughout the lifecycle, refinements may be performed at various stages, as needed. It is such a complex and time-consuming activity that there are not enough DS/ML professionals to fill the job demands, and as much as 80% of their time is spent on low-level activities such as tweaking data or trying out various algorithmic options and model tuning. These two challenges -- the dearth of data scientists, and time-consuming low-level activities -- have stimulated AI researchers and system builders to explore an automated solution for DS/ML work: Automated Data Science (AutoML). Several AutoML algorithms and systems have been built to automate the various stages of the DS/ML lifecycle. For example, the ETL (extract/transform/load) task has been applied to the data readiness, pre-processing & cleaning stage, and has attracted research attention.