"Image understanding (IU) is the research area concerned with the design and experimentation of computer systems that integrate explicit models of a visual problem domain with one or more methods for extracting features from images and one or more methods for matching features with models using a control structure. Given a goal, or a reason for looking at a particular scene, these systems produce descriptions of both the images and the world scenes that the images represent."
– Image Understanding, by J.K. Tsotos. In Encyclopedia of Artificial Intelligence. Stuart C. Shapiro, editor. 1987. New York: John Wiley & Sons.
Since their introduction more than a decade ago, smartphones have been equipped with cameras, allowing users to capture images and video without carrying a separate device. Thanks to the use of computational photographic technologies, which utilize algorithms to adjust photographic parameters in order to optimize them for specific situations, users with little or no photographic training can often achieve excellent results. The boundaries of what constitutes computational photography are not clearly defined, though there is some agreement that the term refers to the use of hardware such as lenses and image sensors to capture image data, and then applying software algorithms to automatically adjust the image parameters to yield an image. Examples of computational photography technology can be found in most recent smartphones and some standalone cameras, including high dynamic range imaging (HDR), auto-focus (AF), image stabilization, shot bracketing, and the ability to deploy various filters, among many other features. These features allow amateur photographers to produce pictures that can, at times, rival photographs taken by professionals using significantly more expensive equipment.
Reducing the latency variance in machine learning inference is a key requirement in many applications. Variance is harder to control in a cloud deployment in the presence of stragglers. In spite of this challenge, inference is increasingly being done in the cloud, due to the advent of affordable machine learning as a service (MLaaS) platforms. Existing approaches to reduce variance rely on replication which is expensive and partially negates the affordability of MLaaS. In this work, we argue that MLaaS platforms also provide unique opportunities to cut the cost of redundancy. In MLaaS platforms, multiple inference requests are concurrently received by a load balancer which can then create a more cost-efficient redundancy coding across a larger collection of images. We propose a novel convolutional neural network model, Collage-CNN, to provide a low-cost redundancy framework. A Collage-CNN model takes a collage formed by combining multiple images and performs multi-image classification in one shot, albeit at slightly lower accuracy. We then augment a collection of traditional single image classifiers with a single Collage-CNN classifier which acts as a low-cost redundant backup. Collage-CNN then provides backup classification results if a single image classification straggles. Deploying the Collage-CNN models in the cloud, we demonstrate that the 99th percentile tail latency of inference can be reduced by 1.47X compared to replication based approaches while providing high accuracy. Also, variation in inference latency can be reduced by 9X with a slight increase in average inference latency.
There is large amount of open source data sets available on the Internet for Machine Learning, but while managing your own project you may require your own data set. Today, let's discuss how can we prepare our own data set for Image Classification. The first and foremost task is to collect data (images). One can use camera for collecting images or download from Google Images (copyright images needs permission). There are many browser plugins for downloading images in bulk from Google Images.
The concepts of neural architecture search and transfer learning are used under the hood to find the best network architecture and the optimal hyperparameter configuration that minimizes the loss function of the model. This article uses Google Cloud AutoML Vision to develop an end-to-end medical image classification model for Pneumonia Detection using Chest X-Ray Images. The dataset is hosted on Kaggle and can be accessed at Chest X-Ray Images (Pneumonia). Go to the cloud console: https://cloud.google.com/ Setup Project APIs, permissions and Cloud Storage bucket to store the image files for modeling and other assets.
Human activity understanding is crucial for building automatic intelligent system. With the help of deep learning, activity understanding has made huge progress recently. But some challenges such as imbalanced data distribution, action ambiguity, complex visual patterns still remain. To address these and promote the activity understanding, we build a large-scale Human Activity Knowledge Engine (HAKE) based on the human body part states. Upon existing activity datasets, we annotate the part states of all the active persons in all images, thus establish the relationship between instance activity and body part states. Furthermore, we propose a HAKE based part state recognition model with a knowledge extractor named Activity2Vec and a corresponding part state based reasoning network. With HAKE, our method can alleviate the learning difficulty brought by the long-tail data distribution, and bring in interpretability. Now our HAKE has more than 7 M+ part state annotations and is still under construction. We first validate our approach on a part of HAKE in this preliminary paper, where we show 7.2 mAP performance improvement on Human-Object Interaction recognition, and 12.38 mAP improvement on the one-shot subsets.
Kaia Health caught our attention last year with an app that tracks your motion using your phone's camera in a bid to help you achieve perfect squat form, though we found it didn't quite hit the mark. Still, Kaia is elevating the concept with an updated version called Kaia Personal Trainer. It says the app will track your exercises and reps, create workout plans tailored to you and offer audio feedback in real time. It doesn't need any equipment other than an iPhone or iPad running iOS 12 (an Android version will arrive in the next few months), though you might still opt to use a fitness tracker. Once you get into position around seven feet away from your device, the app's AI uses a 16-point system to compare the way you move to optimal movement, looking at factors including the positions and angles of your limbs and joints.
Polarr, a six-year-old San Jose computer vision startup cofounded by Stanford graduate and Google veterans Borui Wang and Derek Yan, today announced that it has secured $11.5 million in series A funding led by Threshold Ventures, with participation from Cota Capital and Pear Ventures. Wang said the fresh capital -- which brings its total raised to $13.5 million, according to Crunchbase -- will be used to accelerate research and development; expand platform and service support; and grow its technology partnerships in drone, home appliance, ecommerce, and image storage verticals. "As deep learning compute shifts from the cloud to edge devices, there is a growing opportunity to provide sophisticated and creative edge AI technologies to mobile devices," said Wang, who serves as CEO. "This new round of financing is a tangible endorsement of our approach to enable and inspire everyone to make beautiful creations." Threshold Ventures' Chris Kelley and Pear Ventures' Mar Hershenson will join Polarr's board of directors as part of the round.
TLDR; This series is based on the work detecting complex policies in the following real life code story. Code for the series can be found here. In the previous tutorials we outlined our policy classification challenge and showed how we can approach it using the Custom Vision Cognitive Service. This tutorial introduces deep transfer learning as a means to leverage multiple data sources to overcome data scarcity problem. Before we try to build a classifier for our complex policy let's first look at the MNIST dataset to better understand key image classification concepts such as One Hot Encoding, Linear Modeling, Multi Layer Perception, Masking and Convolutions then we will put these concepts together and apply them to our own dataset.
In this Feb. 1, 2019 photo, surveillance cameras are seen near the spot where "Empire" actor Jussie Smollett allegedly staged the attack in Chicago. Chicago police tapped into a vast network of surveillance cameras _ and some homeowners' doorbell cameras _ to help determine the identities of two brothers who later claimed they were paid by "Empire" actor Jussie Smollett to stage a racist and homophobic attack. CHICAGO (AP) -- Police tapped into Chicago's vast network of surveillance cameras -- and even some homeowners' doorbell cameras -- to track down two brothers who later claimed they were paid by "Empire" actor Jussie Smollett to stage an attack on him, the latest example of the city's high-tech approach to public safety. Officers said they reviewed video from more than four dozen cameras to trace the brothers' movements before and after the reported attack, determining where they lived and who they were before arresting them a little more than two weeks later. Smollett reported being beaten up by two men who shouted racist and anti-gay slurs and threw bleach on him.