Using software to parse the world's visual content is as big of a revolution in computing as mobile was 10 years ago, and will provide a major edge for developers and businesses to build amazing products. While these types of algorithms have been around in various forms since the 1960's, recent advances in Machine Learning, as well as leaps forward in data storage, computing capabilities, and cheap high-quality input devices, have driven major improvements in how well our software can explore this kind of content. Computer Vision is the broad parent name for any computations involving visual content – that means images, videos, icons, and anything else with pixels involved. A classical application of computer vision is handwriting recognition for digitizing handwritten content (we'll explore more use cases below). Any other application that involves understanding pixels through software can safely be labeled as computer vision.
Today's technology landscape is looking great. Artificial intelligence has begun to move from the margins to the mainstream of the global economy and has reached a great level of interest for businesses and the general public. Among the various disciplines of AI, computer vision is acquiring considerable momentum. Let's see what it is all about. Progress in artificial intelligence and robotic technologies tends to reduce the gap between humans and machines capabilities, although there is still a substantial way to go to meet the ultimate goal of a human-like machine.
The simplest way to discuss about AI is by considering the perspective of humans. We know that humans are the most intellectual creatures in this world. So, it is better to compare Artificial Intelligence with Human Intelligence to get a clear vision of AI. AI, a wide branch of Computer Science, is used to create intelligent machines that can recognize human speech, detect objects, solve problems and learn like humans. Humans can write and read text-data in any language.
Before a classification algorithm can do its magic, we need to train it by showing thousands of cat and non-cat images. The general principle in machine learning algorithms is to treat feature vectors as points in higher dimensional space. Then it tries to find planes or surfaces (contours) that separate higher dimensional space in a way that all examples from a particular class are on one side of the plane or surface. To build a predictive model we need neural networks. The neural network is a system of hardware and software similar to our brain to estimate functions that depend on the huge amount of unknown inputs.
We humans depend heavily on five senses for interpreting the world around us. Though each of our senses is important we profusely depend on vision for most of the daily tasks like reading, driving or cooking. Most of the times it's the first thing we use while doing any task. Eyes help us see the path we walk, the road we drive on, and checks for any possible collision. Vision is so important, its only natural that it is also one of the things that humans want to recreate in the machines.