"Image understanding (IU) is the research area concerned with the design and experimentation of computer systems that integrate explicit models of a visual problem domain with one or more methods for extracting features from images and one or more methods for matching features with models using a control structure. Given a goal, or a reason for looking at a particular scene, these systems produce descriptions of both the images and the world scenes that the images represent."
– Image Understanding, by J.K. Tsotos. In Encyclopedia of Artificial Intelligence. Stuart C. Shapiro, editor. 1987. New York: John Wiley & Sons.
Understanding clothes and broad fashion products from such an image would have huge commercial and cultural impacts on modern societies. Deploying such a technology would empower not only the fashion buyers to find what they want, but also those small and large sellers to have quicker sales with less hassle. This technology requires excellence in several computer vision tasks: what the product is in the image (image classification), where it is (object detection, semantic image segmentation, instance segmentation), visual similarity, how to describe the product and its image (image captioning), etc. Recent works in convolutional neural networks (CNNs) have significantly improved the state-of-the-art performance of those tasks. In the image classification task, ResNeXt-101 method has achieved 85.4% in top-1-accuracy1 in ImageNet-1K; in object detection, the best method2 has achieved 52.5% mAP in the COCO 2017 benchmark for generic object detection; in semantic image segmentation, the top-performing method3 has reached 89% mIOU in PASCAL VOC leaderboard for the generic object segmentation.
Want to build an ML model but don't have enough training data? In this post I'll show you how I built an ML pipeline that gathers labeled, crowdsourced training data, uploads it to an AutoML dataset, and then trains a model. I'll be showing an image classification model using AutoML Vision in this example but the same pipeline could easily be adapted to AutoML Natural Language. Here's an overview of how it works: Want to jump to the code? The full example is available on GitHub.
At GraphAware, one of our Graph based solutions is the Knowledge Platform, an Intelligent Insight Engine built atop Neo4j. In order to provide to our customers the ability to unlock hidden insights from new forms of data, we decided to start an R&D phase for video analysis. For this blog post we will analyse the Neo4j Youtube channel video transcripts, extract some insights and show what type of business value such analysis can bring. Youtube offers the ability to download the transcription of the videos, when available. Fetching this data can be done in multiple ways, like connecting to the Google APIs with your preferred client.
Last week I published a blog post about how easy it is to train image classification models with Keras. What I did not show in that post was how to use the model for making predictions. This, I will do here. But predictions alone are boring, so I'm adding explanations for the predictions using the lime package. I have already written a few blog posts (here, here and here) about LIME and have given talks (here and here) about it, too.
Google has introduced some new experimental features for developers working with the Lenovo Mirage Solo, the standalone Daydream headset released earlier this year. First up is see-through mode, a setting that lets the user see the real space around them through the VR headset. Google says this mode plus the Mirage Solo's tracking technology will allow developers to build AR prototypes. It demonstrated an application of this feature through an experimental app that lets Mirage Solo wearers position virtual furniture in a real-world surrounding. Secondly, the company says it's adding APIs that support position controller tracking with six degrees of freedom, which will enable more real-world, natural movement in VR.
Every company today is a tech company, a maxim that was proven out today when one of the world's oldest and biggest art auction houses acquired an AI startup. Sotheby's has bought Thread Genius, which has built a set of algorithms that can both instantly identify objects and then recommend images of similar objects to the viewer. Sotheby's' said it is not disclosing the value of the deal but said it was non-material to the company. Thread Genius was a relatively young company, founded in 2015 and making a debut last year as part of TechStars New York's Winter 2017 cohort. Co-founders Andrew Shum and Ahmad Qamar, who were also Thread Genius's only two employees, were both engineering alums from Spotify.
There are a number of popular evaluation metrics for classification other than accuracy such as recall, precision, AUC, F-scores etc. Instead of listing them all here, I think it is best to point you towards some interesting resources that can kick-start your search for answers. Although you might not be using scikit, the metrics remain relevant. It also quite lists differences between binary classification and multi-class classification setting.
Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods usually require laborious object-level annotations (i.e., object labels and bounding boxes) for effective learning of the object-level visual features. In this paper, we propose a novel and efficient deep framework to boost multi-label classification by distilling knowledge from weakly-supervised detection task without bounding box annotations. Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs. The WSD model is the teacher model and the classification model is the student model. After this cross-task knowledge distillation, the performance of the classification model is significantly improved and the efficiency is maintained since the WSD model can be safely discarded in the test phase. Extensive experiments on two large-scale datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior performances over the state-of-the-art methods on both performance and efficiency.
A conceptually simple way to recognize images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data covers the required configuration space. Here we show that this coverage can be substantially increased using simple strategies of coarse graining (replacing groups of images by their centroids) and sampling (using distinct sets of centroids in combination). We use the MNIST data set to show that coarse graining can be used to convert a subset of training images into about an order of magnitude fewer image centroids, with no loss of accuracy of classification of test-set images by direct (nearest-neighbor) classification. Distinct batches of centroids can be used in combination as a means of sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. The approach works most naturally with multiple processors in parallel.
You don't need always need to build fancy algorithms to tamper with image recognition systems, adding objects in random places will do the trick. In most cases, adversarial models are used to change a few pixels here and there to distort images so objects are incorrectly recognized. A few examples have included stickers that turn images of bananas into toasters, or wearing silly glasses to be fool facial recognition systems into believing you're someone else. Let's not forget the classic case of when a turtle was mistaken as a rifle to really drill home how easy it is to outwit AI. Now researchers from the York University and the University of Toronto, Canada, however, have shown that it's possible to mislead neural networks by copying and pasting pictures of objects into images, too.