Object detection is a very popular task in Computer Vision, where, given an image, you predict (usually rectangular) boxes around objects present in the image and also recognize the types of objects. There could be multiple objects in your image and there are various state-of-the-art techniques and architectures to tackle this problem like Faster-RCNN and YOLO v3. This article talks about the case when there is only one object of interest present in an image. The focus here is more on how to read an image and its bounding box, resize and perform augmentations correctly, rather than on the model itself. The goal is to have a good grasp of the fundamental ideas behind object detection, which you can extend to get a better understanding of the more complex techniques.
In machine learning (ML), if the situation when the model does not generalize well from the training data to unseen data is called overfitting. As you might know, it is one of the trickiest obstacles in applied machine learning. The first step in tackling this problem is to actually know that your model is overfitting. That is where proper cross-validation comes in. After identifying the problem you can prevent it from happening by applying regularization or training with more data. Still, sometimes you might not have additional data to add to your initial dataset. Acquiring and labeling additional data points may also be the wrong path. Of course, in many cases, it will deliver better results, but in terms of work, it is time-consuming and expensive a lot of the time.
The internal backend of the library was changed so that augmentation now happens batchwise instead of input-type-wise. Child augmenters still have the option of using input-type-wise augmentation. This change allows to re-use information between different input types within the same batch, which in turn improves performance and extends the space of possible augmentations. Note: It is now recommended to use a batch-wise augmentation call. These calls provide all inputs of a batch at the same time and several of the new augmenters now explicitly require that (e.g. This patch reworked the quantization routines to also support quantization to N bits instead of N colors in a way that is similar to posterization in PIL.
A couple weeks ago we discussed how to perform multi-label classification using Keras and deep learning. Today we are going to discuss a more advanced technique called multi-output classification. And how are you supposed to keep track of all these terms? You can even combine multi-label classification with multi-output classification so that each fully-connected head can predict multiple outputs! If this is starting to make your head spin, no worries -- I've designed today's tutorial to guide you through multiple output classification with Keras. It's actually quite easier than it sounds. That said, this is a more advanced deep learning technique we're covering today so if you have not already read my first post on Multi-label classification with Keras make sure you do that now. From there, you'll be prepared to train your network with multiple loss functions and obtain multiple outputs from the network.
In this tutorial, you will learn how to build an R-CNN object detector using Keras, TensorFlow, and Deep Learning. Today's tutorial is the final part in our 4-part series on deep learning and object detection: What if we wanted to train an object detection network on our own custom datasets? How can we train that network using Selective Search search? And how will using Selective Search change our object detection inference script? In fact, these are the same questions that Girshick et al. had to consider in their seminal deep learning object detection paper Rich feature hierarchies for accurate object detection and semantic segmentation. Each of these questions will be answered in today's tutorial -- and by the time you're done reading it, you'll have a fully functioning R-CNN, similar (yet simplified) to the one Girshick et al. implemented!