AI researchers are interested in building intelligent machines that can interact with them as they interact with each other. Science fiction writers have given us these goals in the form of HAL in 2001: A Space Odyssey and Commander Data in Star Trek: The Next Generation. However, at present, our computers are deaf, dumb, and blind, almost unaware of the environment they are in and of the user who interacts with them. In this article, I present the current state of the art in machines that can see people, recognize them, determine their gaze, understand their facial expressions and hand gestures, and interpret their activities. I believe that by building machines with such abilities for perceiving, people will take us one step closer to building HAL and Commander Data.
Using software to parse the world's visual content is as big of a revolution in computing as mobile was 10 years ago, and will provide a major edge for developers and businesses to build amazing products. While these types of algorithms have been around in various forms since the 1960's, recent advances in Machine Learning, as well as leaps forward in data storage, computing capabilities, and cheap high-quality input devices, have driven major improvements in how well our software can explore this kind of content. Computer Vision is the broad parent name for any computations involving visual content – that means images, videos, icons, and anything else with pixels involved. A classical application of computer vision is handwriting recognition for digitizing handwritten content (we'll explore more use cases below). Any other application that involves understanding pixels through software can safely be labeled as computer vision.
Today's technology landscape is looking great. Artificial intelligence has begun to move from the margins to the mainstream of the global economy and has reached a great level of interest for businesses and the general public. Among the various disciplines of AI, computer vision is acquiring considerable momentum. Let's see what it is all about. Progress in artificial intelligence and robotic technologies tends to reduce the gap between humans and machines capabilities, although there is still a substantial way to go to meet the ultimate goal of a human-like machine.
Before a classification algorithm can do its magic, we need to train it by showing thousands of cat and non-cat images. The general principle in machine learning algorithms is to treat feature vectors as points in higher dimensional space. Then it tries to find planes or surfaces (contours) that separate higher dimensional space in a way that all examples from a particular class are on one side of the plane or surface. To build a predictive model we need neural networks. The neural network is a system of hardware and software similar to our brain to estimate functions that depend on the huge amount of unknown inputs.
Even though early experiments in computer vision started in the 1950s and it was first put to use commercially to distinguish between typed and handwritten text by the 1970s, today the applications for computer vision have grown exponentially. By 2022, the computer vision and hardware market is expected to reach $48.6 billion. It is such a part of everyday life you likely experience computer vision regularly even if you don't always recognize when and where the technology is deployed. Here is what computer vision is, how it works and seven amazing examples in practice today. What is Computer Vision (CV)?