Deep learning-based detectors usually produce a redundant set of object bounding boxes including many duplicate detections of the same object. These boxes are then filtered using non-maximum suppression (NMS) in order to select exactly one bounding box per object of interest. This greedy scheme is simple and provides sufficient accuracy for isolated objects but often fails in crowded environments, since one needs to both preserve boxes for different objects and suppress duplicate detections. In this work we develop an alternative iterative scheme, where a new subset of objects is detected at each iteration. Detected boxes from the previous iterations are passed to the network at the following iterations to ensure that the same object would not be detected twice.
Object detection is a very popular task in Computer Vision, where, given an image, you predict (usually rectangular) boxes around objects present in the image and also recognize the types of objects. There could be multiple objects in your image and there are various state-of-the-art techniques and architectures to tackle this problem like Faster-RCNN and YOLO v3. This article talks about the case when there is only one object of interest present in an image. The focus here is more on how to read an image and its bounding box, resize and perform augmentations correctly, rather than on the model itself. The goal is to have a good grasp of the fundamental ideas behind object detection, which you can extend to get a better understanding of the more complex techniques.
With the rise of autonomous vehicles, smart video surveillance, facial detection and various people counting applications, fast and accurate object detection systems are rising in demand. These systems involve not only recognizing and classifying every object in an image, but localizing each one by drawing the appropriate bounding box around it. This makes object detection a significantly harder task than its traditional computer vision predecessor, image classification.
The Internet of Things (IoT) has sparked the proliferation of connected devices. These devices, which house sensors to collect data of the day-to-day activities or monitoring purposes, are embedded with microcontrollers and microprocessors chips. These chips are mounted based on the data sensor needed to complete an assigned task. So we don't have a one processor fits all architecture. For example, some devices will perform a limited amount of processing on data sets such as temperature, humidity, pressure, or gravity; more complicated systems, however, will need to handle (multiple) high-resolution sound or video streams.
With about 7000 attendees, the 6 days virtual conference on computer vision concluded a plethora of paper presentations, workshops and tutorials. From the breakthroughs on computer vision to open-sourcing datasets and projects, this conference was loaded with interesting topics and areas including autonomous driving, video sensing, action recognition, and much more. We have already covered the topics and tutorials from day 1 and 2, i.e. In this article, we have listed down all the important topics and tutorials that have been discussed from 16th June to 19th June. This year, the conference witnessed a record of 1,470 research papers on computer vision accepted from 6,656 valid submissions.
What makes us humans so good at making sense of visual data? That's a question that has preoccupied artificial intelligence and computer vision scientists for decades. Efforts at reproducing the capabilities of human vision have so far yielded results that are commendable but still leave much to be desired. Our current artificial intelligence algorithms can detect objects in images with remarkable accuracy, but only after they've seen many (thousands or maybe millions) examples and only if the new images are not too different from what they've seen before. There is a range of efforts aimed at solving the shallowness and brittleness of deep learning, the main AI algorithm used in computer vision today. But sometimes, finding the right solution is predicated on asking the right questions and formulating the problem in the right way.
The wood species classification is an essential field of investigation that can help to combat illegal logging, then providing the timber certification and allowing the application of correct timber taxing. Today, the wood classification relies on highly qualified professionals that analyze texture patterns on timber sections. However, these professionals are scarce, costly, and subject to failure. Therefore, the automation of this task using computational methods is promising. Deep learning has proven to be the ultimate technique in computer vision tasks, but it has not been much exploited to perform timber classification due to the difficulty of building large databases to train such networks. In this study, we introduced the biggest data set of wood timber microscope images to the date, with 281 species, having three types of timber sections: transverse, radial, and tangential.
Foxconn, a global leader in smart manufacturing, is joining Socionext, a major provider of advanced SoC solutions for video and imaging systems, and leading artificial intelligence (AI) chipmaker Hailo to launch the next-generation AI processing solution for video analytics at the edge. Foxconn has combined its high-density, fan-less, and highly efficient edge computing solution, "BOXiedge ", with Socionext's high-efficiency parallel processor "SynQuacer " SC2A11, and the Hailo-8 deep learning processor. The new combination provides market-leading energy efficiency for standalone AI inference nodes, benefiting applications including smart cities, smart medical, smart retail, and industrial IoT. In a global AI market forecasted by research firm IDC to approach $98.4 billion in revenue in 2023, this joint solution helps address the need for cost-effective multiprocessing capabilities required in video analytics, image classifications, and object segmentation. The robust, high-efficiency product is capable of processing and analyzing over 20 streaming camera input feeds in real-time, all at the edge.
This week IBM, Microsoft and Amazon announced that they would suspend the sale of their facial recognition technology to law enforcement agencies. But the moves from the tech giants also illustrate the inherent risks of AI, especially when it comes to bias and the potential for invasion of privacy. Note that there are already indications that Congress will take action to regulate the technology. In the meantime, many cities have already instituted bans, such San Francisco. Because of the advances of deep learning and faster systems for processing enormous amounts of data, facial recognition has certainly seen major strides over the past decade.
Facial recognition systems can be considered a controversial technology. On the one hand, this technology affects people's privacy. On the other hand, it assists in preventing or detecting violence. And now, in light of the global pandemic, it helps to deter the spread of coronavirus. Nonetheless, like any other technology, facial recognition isn't impeccable, but has vulnerabilities that make it possible to bypass a system.