Dataiku Data Science Studio (DSS) is a platform that tries to span the needs of data scientists, data engineers, business analysts, and AI consumers. In addition, Dataiku DSS tries to span the machine learning process from end to end, i.e. from data preparation through MLOps and application support. The Dataiku DSS user interface is a combination of graphical elements, notebooks, and code, as we'll see later on in the review. As a user, you often have a choice of how you'd like to proceed, and you're usually not locked into your initial choice, given that graphical choices can generate editable notebooks and scripts. During my initial discussion with Dataiku, their senior product marketing manager asked me point blank whether I preferred a GUI or writing code for data science.
FlowSense is the natural language interface (NLI) that assists with dataflow diagram editing in VisFlow. See the following VAST fast forward video for a short summary of the NLI. The input provided to FlowSense must be a natural language sentence that specifies a diagram editing operation. The following list shows some example inputs in the context of the sample car/gdp dataset. You must load the sample dataset before trying the other natural language inputs.
Data is the biggest asset of a company. Nothing can be more right. However, the value of data lies in its quality. Low quality data not only has low business value but can be harmful and adversely affect business outcomes. Value of good quality data is unfathomable in business strategy and operations, meeting regulatory compliances and in technology transformations initiatives like data conversions, application consolidations etc.
It is rare that you get data in exactly the right form you need it. Often you'll need to create some new variables, rename existing ones, reorder the observations, or just drop registers in order to make data a little easier to work with. This is called data wrangling (or preparation), and it is a key part of Data Science. Most of the time data you have can't be used straight away for your analysis: it will usually require some manipulation and adaptation, especially if you need to aggregate other sources of data to the analysis. In essence, raw data is messy (usually unusable at the start), and you'll need to roll up your sleeves to get to the right place.
Now that you have learnt how to manipulate data in the tutorials Basics & From Lab to Flow, you're ready to build a model to predict customer value. In this tutorial, you will create your first machine learning model by analyzing the historical customer records and order logs from Haiku T-Shirts. The goal of this tutorial is to predict whether a new customer will become a high-value customer, based on the information gathered during their first purchase. This tutorial assumes that you have completed Tutorial: From Lab to Flow prior to beginning this one! From Dataiku DSS home page, click on the Tutorials button in the left pane, and select Tutorial: Machine Learning.