Drilling into Spark's ALS Recommendation algorithm
The ALS algorithm introduced by Hu et al., is a very popular technique used in Recommender System problems, especially when we have implicit datasets (for example clicks, likes etc). It can handle large volumes of data reasonably well and we can find many good implementations in various Machine Learning frameworks. Spark includes the algorithm in the MLlib component which has recently been refactored to improve the readability and the architecture of the code. Spark's implementation requires the Item and User id to be numbers within integer range (either Integer type or Long within integer range), which is reasonable as this can help speed up the operations and reduce memory consumption. One thing I noticed though while reading the code is that those id columns are being casted into Doubles and then into Integers at the beginning of the fit/predict methods.
Mar-6-2017, 20:20:49 GMT
- Technology: