Pretrained models are a wonderful source of help for people looking to learn an algorithm or try out an existing framework. Due to time restrictions or computational restraints, it's not always possible to build a model from scratch which is why pretrained models exist! You can use a pretrained model as a benchmark to either improve the existing model, or test your own model against it. The potential and possibilities are vast. In this article, we will look at various pretrained models in Keras that have applications in computer vision.
A few months ago I wrote a tutorial on how to classify images using Convolutional Neural Networks (specifically, VGG16) pre-trained on the ImageNet dataset with Python and the Keras deep learning library. The pre-trained networks inside of Keras are capable of recognizing 1,000 different object categories, similar to objects we encounter in our day-to-day lives with high accuracy. Back then, the pre-trained ImageNet models were separate from the core Keras library, requiring us to clone a free-standing GitHub repo and then manually copy the code into our projects. This solution worked well enough; however, since my original blog post was published, the pre-trained networks (VGG16, VGG19, ResNet50, Inception V3, and Xception) have been fully integrated into the Keras core (no need to clone down a separate repo anymore) -- these implementations can be found inside the applications sub-module. Because of this, I've decided to create a new, updated tutorial that demonstrates how to utilize these state-of-the-art networks in your own classification projects.
In this article, I will present several techniques for you to make your first steps towards developing an algorithm that could be used for a classic image classification problem: detecting dog breed from an image. By the end of this article, we'll have developed code that will accept any user-supplied image as input and return an estimate of the dog's breed. Also, if a human is detected, the algorithm will provide an estimate of the dog breed that is most resembling. This project was completed as part of Udacity's Machine Learning Nanodegree (GitHub repo). Convolutional neural networks (also refered to as CNN or ConvNet) are a class of deep neural networks that have seen widespread adoption in a number of computer vision and visual imagery applications.
Understand convolutions (and why they are so much easier to grasp than they seem). Study Convolutional Neural Networks (what they are used for, why we use them, etc.). Review the building blocks of Convolutional Neural Networks, including: Discover common network architecture patterns you can use to design architectures of your own with minimal frustration and headaches. Utilize out-of-the-box CNNs for classification that are pre-trained and ready to be applied to your own images/image datasets (VGG16, VGG19, ResNet50, etc.).
Using software to parse the world's visual content is as big of a revolution in computing as mobile was 10 years ago, and will provide a major edge for developers and businesses to build amazing products. While these types of algorithms have been around in various forms since the 1960's, recent advances in Machine Learning, as well as leaps forward in data storage, computing capabilities, and cheap high-quality input devices, have driven major improvements in how well our software can explore this kind of content. Computer Vision is the broad parent name for any computations involving visual content – that means images, videos, icons, and anything else with pixels involved. A classical application of computer vision is handwriting recognition for digitizing handwritten content (we'll explore more use cases below). Any other application that involves understanding pixels through software can safely be labeled as computer vision.