In our last post we talked about automated product attribute classification using advanced text based machine learning techniques using the given product features like title, description etc. & predicting product attribute values from the defined set of values. As discussed as the catalogue size and no. of suppliers keep growing the problem of maintaining the catalogue accurately grows exponentially and there are thousands of attribute values and millions of products per day to classify. In this post, we are going to highlight some of the keys steps we utilized to deploy machine learning algorithms to classify thousands of attributes and deploying them on dataX, CrowdANALYTIX's proprietary big data curation and veracity optimization platform. As shown in the figure below - client product catalog is extracted, curated and a list of products (new products which need classification or old product refreshes) is sent to dataX . The dataX ecosystem is designed to onboard millions of products each day to make high precision predictions.
"Data wrangling" was an interesting phrase to hear in the machine learning (ML) presentations at Microsoft Ignite. Interesting because data wrangling is from business intelligence (BI), not from artificial intelligence (AI). Microsoft understands ML incorporates concepts from both disciplines. Further discussions point to another key point: Microsoft understands that business-to-business (B2B) is just as fertile for ML as business-to-consumer (B2C). ML applications with the most press are voice, augmented reality and autonomous vehicles.
Classification software: building models to separate 2 or more discrete classes using Multiple methods Decision Tree Rules Neural Bayesian SVM Genetic, Rough Sets, Fuzzy Logic and other approaches Analysis of results, ROC Social Network Analysis, Link Analysis, and Visualization software Text Analysis, Text Mining, and Information Retrieval (IR) Web Analytics and Social Media Analytics software. BI (Business Intelligence), Database and OLAP software Data Transformation, Data Cleaning, Data Cleansing Libraries, Components and Developer Kits for creating embedded data mining applications Web Content Mining, web scraping, screen scraping.
"I took a few of your courses and you are an amazing teacher. Your courses have brought me up to speed on how to create databases and how to interact and handle Data Engineers and Data Scientists. I will be forever grateful." "By taking this course my perception has changed and now data science for me is more about data wrangling. Welcome to The Complete Course for Machine Learning Engineers.
In a data science project, almost always the most time consuming and messy part is the data gathering and cleaning. Everyone likes to build a cool deep neural network (or XGboost) model or two and show off one's skills with cool 3D interactive plots. But the models need raw data to start with and they don't come easy and clean. But why gather data or build model anyway? The fundamental motivation is to answer a business or scientific or social question.