Image Understanding

Canny AI: Imagine world leaders singing


Deep Learning is really starting to establish itself as a major new tool in visual effects. Currently the tools are still in their infancy but they are changing the way visual effects can be approached. Instead of a pipeline consisting of modelling, texturing, lighting and rendering, these new approaches are hallucinating or plausibly creating imagery that is based on training data sets. Machine Learning, the superset of Deep Learning and similar approaches have had great success in image classification, image recognition and image synthesis. At fxguide we covered Synthesia in the UK, a company born out of research first published as Face2Face.

Google Cloud AutoML Vision for Medical Image Classification


The concepts of neural architecture search and transfer learning are used under the hood to find the best network architecture and the optimal hyperparameter configuration that minimizes the loss function of the model. This article uses Google Cloud AutoML Vision to develop an end-to-end medical image classification model for Pneumonia Detection using Chest X-Ray Images. The dataset is hosted on Kaggle and can be accessed at Chest X-Ray Images (Pneumonia). Go to the cloud console: Setup Project APIs, permissions and Cloud Storage bucket to store the image files for modeling and other assets.

Retail Automation Image Recognition for Retail


"'s tags have given us the exact level of granularity we need on our ecommerce platform. Our merchandisers and buyers have a lot of manual work while labelling products. They're also fixing data inconsistencies we receive from our external vendors, an expensive and time consuming process. With an automated catalog management tool like VueTag, we efficiently tag products, provide better product discovery for our shoppers, and speeden our go-to-market strategy"

HAKE: Human Activity Knowledge Engine Artificial Intelligence

Human activity understanding is crucial for building automatic intelligent system. With the help of deep learning, activity understanding has made huge progress recently. But some challenges such as imbalanced data distribution, action ambiguity, complex visual patterns still remain. To address these and promote the activity understanding, we build a large-scale Human Activity Knowledge Engine (HAKE) based on the human body part states. Upon existing activity datasets, we annotate the part states of all the active persons in all images, thus establish the relationship between instance activity and body part states. Furthermore, we propose a HAKE based part state recognition model with a knowledge extractor named Activity2Vec and a corresponding part state based reasoning network. With HAKE, our method can alleviate the learning difficulty brought by the long-tail data distribution, and bring in interpretability. Now our HAKE has more than 7 M+ part state annotations and is still under construction. We first validate our approach on a part of HAKE in this preliminary paper, where we show 7.2 mAP performance improvement on Human-Object Interaction recognition, and 12.38 mAP improvement on the one-shot subsets.

A geometry-inspired decision-based attack Machine Learning

Deep neural networks have recently achieved tremendous success in image classification. Recent studies have however shown that they are easily misled into incorrect classification decisions by adversarial examples. Adversaries can even craft attacks by querying the model in black-box settings, where no information about the model is released except its final decision. Such decision-based attacks usually require lots of queries, while real-world image recognition systems might actually restrict the number of queries. In this paper, we propose qFool, a novel decision-based attack algorithm that can generate adversarial examples using a small number of queries. The qFool method can drastically reduce the number of queries compared to previous decision-based attacks while reaching the same quality of adversarial examples. We also enhance our method by constraining adversarial perturbations in low-frequency subspace, which can make qFool even more computationally efficient. Altogether, we manage to fool commercial image recognition systems with a small number of queries, which demonstrates the actual effectiveness of our new algorithm in practice.

Kaia's motion-tracking workout app remembers which rep you're on


Kaia Health caught our attention last year with an app that tracks your motion using your phone's camera in a bid to help you achieve perfect squat form, though we found it didn't quite hit the mark. Still, Kaia is elevating the concept with an updated version called Kaia Personal Trainer. It says the app will track your exercises and reps, create workout plans tailored to you and offer audio feedback in real time. It doesn't need any equipment other than an iPhone or iPad running iOS 12 (an Android version will arrive in the next few months), though you might still opt to use a fitness tracker. Once you get into position around seven feet away from your device, the app's AI uses a 16-point system to compare the way you move to optimal movement, looking at factors including the positions and angles of your limbs and joints.

Polarr raises $11.5 million for offline, on-device computational photography


Polarr, a six-year-old San Jose computer vision startup cofounded by Stanford graduate and Google veterans Borui Wang and Derek Yan, today announced that it has secured $11.5 million in series A funding led by Threshold Ventures, with participation from Cota Capital and Pear Ventures. Wang said the fresh capital -- which brings its total raised to $13.5 million, according to Crunchbase -- will be used to accelerate research and development; expand platform and service support; and grow its technology partnerships in drone, home appliance, ecommerce, and image storage verticals. "As deep learning compute shifts from the cloud to edge devices, there is a growing opportunity to provide sophisticated and creative edge AI technologies to mobile devices," said Wang, who serves as CEO. "This new round of financing is a tangible endorsement of our approach to enable and inspire everyone to make beautiful creations." Threshold Ventures' Chris Kelley and Pear Ventures' Mar Hershenson will join Polarr's board of directors as part of the round.

The best image-recognition AIs are fooled by slightly rotated images

New Scientist

TELLING a yellow taxi and a pair of binoculars apart is so easy most people could do it standing on their head. Not so for an artificial intelligence: flip the cab upside down and it sees binoculars. This is just one of dozens of examples that show AI is a lot worse at identifying objects by sight than many people realise.

Learn Python AI for Image Recognition & Fraud Detection


Combine Python & TensorFlow powers to build projects. In this course, you will learn how to code in Python, calculate linear regression with TensorFlow, and use AI for automation. Together with a professional you will perform CIFAR 10 image data and recognition and analyze credit card fraud by building practical projects. We explain everything in a straightforward teaching style that is easy to understand. Join Mammoth Interactive in this course, where we blend theoretical knowledge with hands-on coding projects to teach you everything you need to know as a beginner to credit card fraud detection What you'll learn Learn how to code in Python, a popular coding language used for websites like YouTube and Instagram.

Artificial Intelligence in Enterprise Applications with TensorFlow and Joget DX


The AI focus in Joget DX is to simplify the integration of pre-trained AI models into end user applications. As rationalized in the previous article, the training of AI models are best left to machine learning experts so once a trained model is available, the goal is to make it as accessible as possible to app designers. With the bundled TensorFlow AI plugin, you essentially: Upload a pre-trained TensorFlow model exported in protobuf (.pb) format Configure the inputs and outputs Configure optional post processing The following sections showcases how a sample app on Joget DX incorporates some well known models for several common AI use cases: Image Classification Audio Classification Text Sentiment Analysis Sample No-Code AI Apps on Joget DX Image Classification Inception v3 is a widely-used image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset. ImageNet is a dataset containing for image classification containing than 14 million labeled images. COCO is a large-scale dataset for object detection that contains 1.5 million object instances.