How to squeeze the most from your training data

Jul-29-2017, 01:21:53 GMT–#artificialintelligence

In many cases, the acquisition of well-labelled training data is a huge hurdle for developing accurate prediction systems with supervised learning. At Love the Sales, we had the requirement to apply classification to the textual metadata of 2 million products (mostly fashion and homewares) into 1,000 different categories – represented in a hierarchy. In order to achieve this, we have architected a hierarchical tree of chained 2-class linear (Positive vs Negative) Support Vector Machines (LibSVM), each responsible for binary document classification of each hierarchical class. A key learning, is that the way in which these SVM's are structured can actually have a significant impact on how much training data has to be applied, for example, a naive approach would have been as follows: This approach requires that for every additional sub-category, two new SVM's be trained – for example, the addition of a new class for'Swimwear' would require an additional SVM under Men's and Women's – not to mention the potential complexity of adding a'Unisex' class at the top level. Overall, deep hierarchical structures can be too rigid to work with.

artificial intelligence, machine learning, training data, (17 more...)

#artificialintelligence

Jul-29-2017, 01:21:53 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (0.70)
  - Statistical Learning > Support Vector Machines (0.56)