"Image understanding (IU) is the research area concerned with the design and experimentation of computer systems that integrate explicit models of a visual problem domain with one or more methods for extracting features from images and one or more methods for matching features with models using a control structure. Given a goal, or a reason for looking at a particular scene, these systems produce descriptions of both the images and the world scenes that the images represent."
– Image Understanding, by J.K. Tsotos. In Encyclopedia of Artificial Intelligence. Stuart C. Shapiro, editor. 1987. New York: John Wiley & Sons.
The Soli Sensor, being developed as part of a project of Google Advanced Technology and Projects group, is a low-power radar designed to use less energy and detect hand gestures on a sub-millimeter level. It operates in the 60-GHz ISM band using electromagnetic waves. The sensor detects a series of motions that are part of Soli's Virtual Tools Gestures: virtual slider, virtual button and even virtual button. The pluses for the chip are that it requires less energy, has no moving parts, can function regardless of the light conditions, and when developed further in the future, could be used in a number of products: wearables, IoT devices, phones and cars. Ivan Poupyrev, Poject Soli founder, said about the goal of the project, "The hand is the ultimate input device.
From the outset, Microsoft Pix has used artificial intelligence to help people more easily capture better pictures of family and friends with an iPhone. Now, you can use the smartphone camera app to take better pictures of documents, whiteboards and business cards, too. Because, let's face it, we all use our cameras to take photos of more than just people and places. "We have data that shows people are taking a lot of whiteboard photos at work, they are doing a lot of document scanning," said Josh Weisberg, a principal program manager in the Computational Photography Group within Microsoft's research organization in Redmond, Washington. Based on the app data and customer feedback, Weisberg's team released an update on Thursday to Microsoft Pix that includes enhanced deep-learning capabilities around image understanding to address several productivity scenarios.
The updates are included in the company's suite of Cognitive Services -- APIs that help developers deliver intelligent capabilities even if they don't have a great deal of AI expertise. The three updated services -- Microsoft's Custom Vision Service, Face API, and Bing Entity Search -- are designed to make AI easier for companies that can't keep a professional data scientist on staff. That's important, given the limited number of AI experts currently available, how much they cost to hire, and how complicated the task of rolling your own AI capabilities can be. The Custom Vision Service is now in paid public beta. It allows companies to train their own image classification tools for specific tasks (like identifying different types of birds or distinguishing between varieties of cucumber) without building their own AI models.
Computer vision will play a crucial role in visual search, self-driving cars, medicine and many other applications. Success will hinge on collecting and labeling large labeled datasets which will be used to train and test new algorithms. One area that has seen great advances over the last five years is image classification i.e. determining automatically what objects are present in an image. Existing image classification datasets have an equal number of images for each class. However, the real world is long tailed: only a small percentage of classes are likely to be observed; most classes are infrequent or rare.
A while back, I was reading an article posted on Facebook, about Clovis people found alive and well living in Florida, with a picture featuring tribesmen (see below.) The quality of the picture was poor, and the URL was very suspicious: baynews9.com.ddwg.clonezone.link, as to make it appear that it was from Baynews9.com. It turned out that the picture (and thus the whole story) was fake: these people are real people living in Peru, see here for a Youtube video about them.
There are many different ways to do image recognition. Google recently released a new Tensorflow Object Detection API to give computer vision everywhere a boost. I added a second phase for this project where I used the Tensorflow Object Detection API on a custom dataset to build my own toy aeroplane detector. So what was the experience like? First lets understand the API.