Computer vision is fundamental for a broad set of Internet of Things (IoT) applications. Household monitoring systems use cameras to provide family members with a view of what's going on at home. Robots and drones use vision processing to map their environment and avoid obstacles in flight. Augmented reality glasses use computer vision to overlay important information on the user's view, and cars stitch images from multiple cameras mounted in the vehicle to provide drivers with a surround or "bird's eye" view which helps prevent collisions.
Computers are getting better each year at AI-style tasks, especially those involving vision--identifying a face, say, or telling if a picture contains a certain object. In fact, their progress has been so significant that some researchers now believe the standardized tests used to evaluate these programs have become too easy to pass, and therefore need to be made more demanding. At issue are the "public data sets" commonly used by vision researchers to benchmark their progress, such as LabelMe at MIT or Labeled Faces in the Wild at the University of Massachusetts, Amherst. The former, for example, contains photographs that have been labeled via crowdsourcing, so that a photo of street scene might have a "car" and a "tree" and a "pedestrian" highlighted and tagged. Success rates have been climbing for computer vision programs that can find these objects, with most of the credit for that improvement going to machine learning techniques such as convolutional networks, often called Deep Learning.
Computer vision technology of today is powered by deep learning algorithms that use a special kind of neural networks, called convolutional neural network (CNN), to make sense of images. These neural networks are trained using thousands of sample images which helps the algorithm understand and break down everything that's contained in an image. These neural networks scan images pixel by pixel, to identify patterns and "memorize" them. It also memorizes the ideal output that it should provide for each input image (in case of supervised learning) or classifies components of images by scanning characteristics such as contours and colors. This memory is then used by the systems as the reference while scanning more images.
A new report on global AI patents and publications has offered an interesting snapshot of the current boom--including the uneven way it is being commercialized. Run the numbers: The report (pdf) from the World Intellectual Property Organization shows that since the field of AI was established in the 1950s, 340,000 AI-related inventions have been patented and over 1.6 million scientific papers published. Unsurprisingly, the data shows that interest in AI has exploded in the past five years and that China and the US are dominant in the technology. IBM was the company that owned the most patents. Tunnel vision: The figures also show the disproportionate attention going to one application of AI.
Of all the interesting obstacles slowing down the advancement of artificial intelligence, computer vision may be the most compelling. This is due to the multifaceted challenge of programming a machine with enough inductive reasoning to extrapolate information from observations and come up with plausible and accurate conclusions. Of course, this is the end goal of artificial intelligence research – endowing a computer with the power and ability to think, at least within reason. When it comes to translating flexible human thought processes into more structured machines, there are a handful of problems that slow down the computer's mastery. While we move around the world and throughout our daily routines, we see an uncountable number of images that our brain parses through and then separates into different classifications.