Finding serial numbers with a crawler & simple perceptron [x-post from languagetechnology]. • /r/MachineLearning
So I am trying to crawl through a large number of websites and pull out serial numbers. This is proving challenging, since the serial numbers are not of any set length, have arbitrary spacing/character sets/punctuation inside them(dashes, etc), and are sometimes contained in downloadable static files such as excel sheets. The solution I'm currently exploring is training a fairly simple single layer perceptron to decide if something'looks' like a serial number or not. After removing all words that can be ruled out by more conventional means, I run the perceptron on everything remaining. The problem I'm running into is how to vectorize the input.
May-12-2016, 23:45:46 GMT
- Technology: