Anscombe Quartet and use of Exploratory Data Analysis - WeirdGeek
Whether you are working as Data Scientist or looking to build a career in a Data Science, the pipeline of your work include Extracting dataset, loading dataset, Data Cleansing and munging, finding summary statistics, then do some Exploratory Data analysis (EDA), and after all these things build a model using machine learning. Anscombe Quartet dataset demonstration is one example that shows us, depending only on summary statistics can be troublesome and how badly it can affect our machine learning model. Here for this post, we are going to use Anscombe-quartet data set which is stored as an excel file and we can read it using the pd.read_excel(). It's a group of four subsets that appear to be similar when using typical summary statistics, but when you plot all the groups using the Matplotlib package, you'll see a different story. Each dataset consists of eleven (x,y) pairs as follows: We have labelled four pairs as (X, Y),(X.1,
Nov-17-2018, 17:20:42 GMT
- Technology: