Machine Learning in a nutshell, Issue no. 5: Spam classification - Using an Artificial Neural Network (ANN) - openForce Information Technology
Last time we discussed where Artificial Neural Networks (ANN) come from and basic concepts behind Multilayer Perceptrons (MLP). This time we use such a MLP for the spam classification problem of issue no. 3. This was good enough for our baseline model, since we just filtered on specific words, numbers and currency symbols. However, real-world machine learning algorithms only understand numbers and thus we need to transform the words into numbers, a task which falls into an area called natural language processing (NLP) which is a huge research area by itself. A very common approach is bag-of-words, where the vocabulary of the problem domain is represented as a sparse vector, where each element of the vector represents a single word and the value is either the number of occurrences in a certain document or just a binary value indicating that the word occurred one or several times, which is also called one-hot-encoding.
Mar-8-2017, 17:55:16 GMT
- Technology: