The power of machine learning in Spark
One of the major differentiators between Apache Spark and the prior generation of Apache Hadoop–based and MapReduce-based technologies is the built-in Spark machine-learning library (MLlib). The motivation behind including these capabilities is to make practical machine learning scalable and understandable for data engineers and data scientists. MLlib also leverages Spark's distributed, in-memory execution model to yield significant performance benefits over preceding technologies such as R and Apache Mahout. While the capabilities in MLlib are powerful in the abstract, one still needs to identify a practical application, implement a technical solution and productionalize the analysis for its downstream consumers. As I discussed in the post, Spark: The operating system for big data analytics, Spark makes the implementation and productionalization of advanced data analysis significantly less challenging than the aforementioned technologies.
Jan-30-2017, 20:20:20 GMT
- Technology: