Finding serial numbers with a crawler & simple perceptron [x-post from languagetechnology]. • /r/MachineLearning

@machinelearnbot 

So I am trying to crawl through a large number of websites and pull out serial numbers. This is proving challenging, since the serial numbers are not of any set length, have arbitrary spacing/character sets/punctuation inside them(dashes, etc), and are sometimes contained in downloadable static files such as excel sheets. The solution I'm currently exploring is training a fairly simple single layer perceptron to decide if something'looks' like a serial number or not. After removing all words that can be ruled out by more conventional means, I run the perceptron on everything remaining. The problem I'm running into is how to vectorize the input.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found