caffeonspark
Distributed TensorFlow Has Arrived
KDnuggets has taken seriously its role to keep up with the newest releases of major deep learning projects, and in the recent past we have seen landmark such releases from major technology giants and as well as universities and research labs. While Microsoft, Yahoo!, AMPLabs, and others have all contributed outstanding projects in their own right, the landscape was most impacted in November, 2015, with the release of what is now the most popular open source machine learning library on Github by a wide margin, Google's TensorFlow. I wrote in the early days after its release of my initial dissatisfaction with the project, based primarily on the lack of distributed training capabilities (especially given that such capabilities were directly alluded to in the accompanying whitepaper's title). There were also a few other - lesser - "issues" I had with it, but the central point of contention was that it was single node only. This original post was polarizing, with many people upset at my "dismissal" of the tech powerhouse's latest offering (a closer read would reveal that I did not, in any way, dismiss it).
Yahoo open-sources TensorFlowOnSpark, new distributed deep learning framework - PCQuest
Yahoo has announced TensorFlowOnSpark, its latest open source framework for distributed deep learning on big data clusters. Deep learning (DL) has evolved significantly in recent years. At Yahoo, we've found that in order to gain insight from massive amounts of data, we need to deploy distributed deep learning. Existing DL frameworks often require us to set up separate clusters for deep learning, forcing us to create multiple programs for a machine learning pipeline (see Figure 1 below). Having separate clusters requires us to transfer large datasets between them, introducing unwanted system complexity and end-to-end learning latency.
Yahoo supercharges TensorFlow with Apache Spark
Yahoo, model Apache Spark citizen and developer of CaffeOnSpark, which made it easier for developers building deep learning models in Caffe to scale with parallel processing, is open sourcing a new project called TensorFlowOnSpark. The pairing of Spark and TensorFlow should make the deep learning framework more attractive to developers who are creating models that need to run on large computing clusters. For those that zoned out during the big-data boom, Apache Spark is an open source framework designed to increase the efficiency of parallel computing. Following in the steps of tools like Hadoop, Spark made it possible for companies like Netflix to process huge amounts of user data to offer up recommendations at scale. Machine learning frameworks like Google's TensorFlow and Caffe help people create deep learning models without the rigorous skill-set of a machine learning specialist.
More Open AI and Machine Learning Toolsets Arrive
More Open AI and Machine Learning Toolsets Arrive by - Dec. 02, 2016 Google's Open Embedded Projector is a Cool Data Visualization Tool Google Collects Open Artificial Intelligence Demos, Invites You to Contribute The Renaissance Continues for Open Source Artificial Intelligence Microsoft Open Sources Transformative Speech Recognition Toolkit Google Open Sources Powerful Image Recognition Tool Recently, in an article for TechCrunch, Spark Capital's John Melas-Kyriazi weighed in on how startups can leverage artificial intelligence and machine learning to advance their businesses or even give birth to brand new ones. As a corollary avenue on that topic, it's worth noting that some very powerful artificial intelligence and machine learning engines have recently been open sourced. Quite a few of them have been tested and hardened at Google, Facebook, Microsoft and other companies, and some of them may represent business opportunities. Just recently, two new open source entries on this front have emerged, and they are worth investigating. Health Catalyst has created healthcare.ai as a repository of healthcare-focused open source machine learning software, with an eye toward encouraging the healthcare industry to tap into the power of AI and machine learning.
Microsoft Open Sources Project Malmo, Another AI Milestone
Without a doubt, cloud computing and Big Data analytics are top of mind for many people when it comes to hot technology categories where open source is making a big difference. However, there is an absolute renaissance goind on right now in the field of artifical intelligence and the closely related field of machine learning. Sundar Pichai, Google's CEO, recently said on a conference call, "I do think in the long run we will evolve in computing from a mobile-first to an A.I.-first world." Facebook, Google and many other companies have been open sourcing key AI tools as well. Now, Microsoft has released an artificial intelligence system dubbed Project Malmo to the open source community.
Distributed Deep Learning with Caffe Using a MapR Cluster
We have experimented with CaffeOnSpark on a 5 node MapR 5.1 cluster running Spark 1.5.2 and will share our experience, difficulties, and solutions on this blog post. Deep learning is getting a lot of attention recently, with AlphaGo beating a top world player at a game that was thought so complicated as to be out of reach of computers just five years ago. Deep learning is not just beating humans at Go, but also at pretty much every Atari computer game. But the fact is, deep learning is also useful for tasks with clear enterprise applications in the fields of image classification and speech recognition, AI chat bots and machine translation, just to name a few. Caffe is a C /CUDA deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC).
On the Artificial Intelligence Front, Open Source Tools are Proliferating
If you ask many people to name the technology categories that are creating sweeping change right now, cloud computing and Big Data analytics would probably be top of mind for a lot of them. However, there is an absolute renaissance goind on right now in the field of artifical intelligence and the closely related field of machine learning. Some of the biggest tech companies are helping to drive the trend, and Google added to the momentum on this front this week. Specifically, Sundar Pichai, Google's CEO, said on a conference call, "I do think in the long run we will evolve in computing from a mobile-first to an A.I.-first world." In this post, you'll find a collection of the most notable A.I. tools that have recently been open sourced.
Yahoo! CaffeOnSpark: Distributed Deep Learning on Big Data Clusters
Deep learning (DL) is a critical capability required by Yahoo product teams (ex. Flickr, Image Search) to gain intelligence from massive amounts of online data. Many existing DL frameworks require a separated cluster for deep learning, and multiple programs have to be created for a typical machine learning pipeline (see Figure 1). The separated clusters require large datasets to be transferred among them, and introduce unwanted system complexity and latency for end-to-end learning. As discussed in our earlier Tumblr post, we believe that deep learning should be conducted in the same cluster along with existing data processing pipelines to support feature engineering and traditional (non-deep) machine learning.
Yahoo just made deep learning easier with CaffeOnSpark
Yahoo! Inc., is getting into the artificial intelligence (AI) game with the release of new internally-built software under an open-source license. Called CaffeOnSpark, the software is able to perform'deep learning' on the vast ocean of data kept in Yahoo's Hadoop file system. Now, the company has made it available on GitHub for everyone to use. Deep learning is a machine learning method that's particularly useful in helping computers come to sort through and recognize user-generated data, and one of its most exciting use cases is where images are concerned. As such, Yahoo built CaffeOnSpark to help identify the billions of images posted onto its Flickr photo sharing website.