In our last post we talked about automated product attribute classification using advanced text based machine learning techniques using the given product features like title, description etc. & predicting product attribute values from the defined set of values. As discussed as the catalogue size and no. of suppliers keep growing the problem of maintaining the catalogue accurately grows exponentially and there are thousands of attribute values and millions of products per day to classify. In this post, we are going to highlight some of the keys steps we utilized to deploy machine learning algorithms to classify thousands of attributes and deploying them on dataX, CrowdANALYTIX's proprietary big data curation and veracity optimization platform. As shown in the figure below - client product catalog is extracted, curated and a list of products (new products which need classification or old product refreshes) is sent to dataX . The dataX ecosystem is designed to onboard millions of products each day to make high precision predictions.
"Data wrangling" was an interesting phrase to hear in the machine learning (ML) presentations at Microsoft Ignite. Interesting because data wrangling is from business intelligence (BI), not from artificial intelligence (AI). Microsoft understands ML incorporates concepts from both disciplines. Further discussions point to another key point: Microsoft understands that business-to-business (B2B) is just as fertile for ML as business-to-consumer (B2C). ML applications with the most press are voice, augmented reality and autonomous vehicles.
Classification software: building models to separate 2 or more discrete classes using Multiple methods Decision Tree Rules Neural Bayesian SVM Genetic, Rough Sets, Fuzzy Logic and other approaches Analysis of results, ROC Social Network Analysis, Link Analysis, and Visualization software Text Analysis, Text Mining, and Information Retrieval (IR) Web Analytics and Social Media Analytics software. BI (Business Intelligence), Database and OLAP software Data Transformation, Data Cleaning, Data Cleansing Libraries, Components and Developer Kits for creating embedded data mining applications Web Content Mining, web scraping, screen scraping.
Timely and accurate agricultural impact assessments for droughts are critical for designing appropriate interventions and policy. These assessments are often ad hoc, late, or spatially imprecise, with reporting at the zonal or regional level. This is problematic as we find substantial variability in losses at the village-level, which is missing when reporting at the zonal level. In this paper, we propose a new data fusion method--combining remotely sensed data with agricultural survey data--that might address these limitations. We apply the method to Ethiopia, which is regularly hit by droughts and is a substantial recipient of ad hoc imported food aid.
CrowdFlower, a crowdsourced data-cleaning and tagging platform, has closed a fresh 10 million in funding in a round led by Microsoft Ventures, Canvas Ventures, and Trinity Ventures. Founded out of San Francisco in 2009, CrowdFlower had previously raised 28 million, including a 12.5 million round back in 2014, but the company says it will use its cash influx to expedite the adoption of a new machine-learning product called CrowdFlower AI. CrowdFlower will use a combination of humans and artificial intelligence to "make data useful" by helping teams organize their data more effectively at scale. For example, a company may have a large database of numbers and demographics, but some fields might be empty, incorrect, or incomplete -- something that a human is best placed to fix. So when a company uploads its data into CrowdFlower and stipulates the rules, humans from around the world link in and do the labeling manually.