Using Apache Spark with TensorFlow on Google Cloud Platform Google Cloud Big Data and Machine Learning Blog Google Cloud Platform


Apache Spark and TensorFlow are both open-source projects that have made significant impact in the world of enterprise software in recent years. TensorFlow provides a foundational framework for running distributed numerical computations, such as deep learning algorithms, while Spark is a general Hadoop-like, large-scale data processing framework that's also a popular choice for more traditional machine learning algorithms using MLlib. Google Cloud Platform offers managed services for both Apache Spark, called Cloud Dataproc, and TensorFlow, called Cloud ML Engine. Both of these services deliver the power of their respective open-source frameworks in a managed environment, letting you focus on the data science while we worry about the operations. Intuitively, there is some overlap -- Spark provides a framework for big data computations, and the type of datasets that power TensorFlow algorithms tends to be large.