tensorflowonspark
Spark Summit spotlights Machine Learning, but that's not all ZDNet
Paraphrasing Garrison Keillor, it's been a quiet week in the Apache Spark community - at least compared to last year, where the definitive Spark 2.0 was unveiled. Last week, Spark Summit pulled into Boston, and so did one of those nor'easters that make Boston so alluring in February. And so the Spark project, for now, is engaging in the blocking and tackling chores of cleaning up or optimizing APIs. For instance, a recent update of Spark 2.0 has added pipeline processing for enabling more efficient running of complex machine learning jobs. And of course while we're on that topic of machine learning, it was virtually impossible to evade presentations covering it.
Yahoo open-sources TensorFlowOnSpark, new distributed deep learning framework - PCQuest
Yahoo has announced TensorFlowOnSpark, its latest open source framework for distributed deep learning on big data clusters. Deep learning (DL) has evolved significantly in recent years. At Yahoo, we've found that in order to gain insight from massive amounts of data, we need to deploy distributed deep learning. Existing DL frameworks often require us to set up separate clusters for deep learning, forcing us to create multiple programs for a machine learning pipeline (see Figure 1 below). Having separate clusters requires us to transfer large datasets between them, introducing unwanted system complexity and end-to-end learning latency.
Yahoo supercharges TensorFlow with Apache Spark
Yahoo, model Apache Spark citizen and developer of CaffeOnSpark, which made it easier for developers building deep learning models in Caffe to scale with parallel processing, is open sourcing a new project called TensorFlowOnSpark. The pairing of Spark and TensorFlow should make the deep learning framework more attractive to developers who are creating models that need to run on large computing clusters. For those that zoned out during the big-data boom, Apache Spark is an open source framework designed to increase the efficiency of parallel computing. Following in the steps of tools like Hadoop, Spark made it possible for companies like Netflix to process huge amounts of user data to offer up recommendations at scale. Machine learning frameworks like Google's TensorFlow and Caffe help people create deep learning models without the rigorous skill-set of a machine learning specialist.