Google's Google Cloud division today announced it has made generally available two search functions that rely on machine learning techniques to help retailers who use its cloud service. Called Vision API product search and Recommendations AI, the two services are part of what Google has unveiled as a suit of functions called Product Discovery Solutions for Retail. The vision search function will let a retailer's customers submit a picture and received ranked results of products that match the picture in either appearance or semantic similarity to the object. Recommendations, said Google, is "able to piece together the history of a customer's shopping journey and serve them with customized product recommendations." Both are generally available now to retailers.
How auto-manufacturers can apply ML & AI algorithms to enhance image analytics on their factory floor and to ensure higher product quality? Despite its great potential for quality control, vision inspection is far from reaching its full potential in manufacturing. Manual inspection, as well as traditional computer vision methods, are prone to error and are often unable to uncover the root cause of problems. In search of a solution to optimize its welding process, a leading powertrain manufacturer turned to OptimalPlus. Using advanced image algorithms, the OptimalPlus platform extracts key features from images, analyzes them, and informs MES decisions in near real-time.
BEGIN ARTICLE PREVIEW: Test, Measurement & Analytics WHITEPAPERS How a powertrain manufacturer optimized its welding process using advanced image algorithms from a platform that extracts key features from images, analyzes them, and informs MES decisions in near real-time. How auto-manufacturers can apply ML & AI algorithms to enhance image analytics on their factory floor and to ensure higher product quality? Discover the next generation visual inspection in our new case study. In this case study , you will learn about: Current limitations of image inspection in the manufacturing industry. The O+ end-to-end solution, which brings machine learning and deep learning to image analysis in the production line. How a powertrain manufacturer deployed OptimalPlus’ software on the edge and utilized its image analysis capabilities to optimize the welding process. Despite its great potentia
While photovoltaic (PV) systems are installed at an unprecedented rate, reliable information on an installation level remains scarce. As a result, automatically created PV registries are a timely contribution to optimize grid planning and operations. This paper demonstrates how aerial imagery and three-dimensional building data can be combined to create an address-level PV registry, specifying area, tilt, and orientation angles. We demonstrate the benefits of this approach for PV capacity estimation. In addition, this work presents, for the first time, a comparison between automated and officially-created PV registries. Our results indicate that our enriched automated registry proves to be useful to validate, update, and complement official registries.
I find them incredibly irritating. Those images you have to click on to prove that you are not a robot. If you are just one click away from a nice weekend away, you first have to figure out where you can see the traffic lights on 16 tiny fuzzy squares. Google makes grateful use of these puzzling attempts. For one thing, the company uses artificial intelligence to train its image recognition software.
Prior to 2017, most renditions of neural network models were coded in a batch scripting style. As AI researchers and experienced software engineers became increasingly involved in research and design, we started to see a shift in the coding of models that reflected software engineering principles for reuse and design patterns. A design pattern implies that there is a "best practice" for constructing and coding a model that can be reapplied across a wide range of cases, such as image classification, object detection and tracking, facial recognition, image segmentation, super resolution and style transfer. The introduction of design patterns also helped advance convolutional neural networks (as well as other network architectures) by aiding other researchers in understanding and reproducing a model's architecture. A procedural style for reuse was one of the earliest versions of using design patterns for neural network models.
Image recognition with prototypes is considered an interpretable alternative for black box deep learning models. Classification depends on the extent to which a test image "looks like" a prototype. However, perceptual similarity for humans can be different from the similarity learnt by the model. A user is unaware of the underlying classification strategy and does not know which image characteristics (e.g., color or shape) is the dominant characteristic for the decision. We address this ambiguity and argue that prototypes should be explained. Only visualizing prototypes can be insufficient for understanding what a prototype exactly represents, and why a prototype and an image are considered similar. We improve interpretability by automatically enhancing prototypes with extra information about visual characteristics considered important by the model. Specifically, our method quantifies the influence of color hue, shape, texture, contrast and saturation in a prototype. We apply our method to the existing Prototypical Part Network (ProtoPNet) and show that our explanations clarify the meaning of a prototype which might have been interpreted incorrectly otherwise. We also reveal that visually similar prototypes can have the same explanations, indicating redundancy. Because of the generality of our approach, it can improve the interpretability of any similarity-based method for prototypical image recognition.
Table recognition is a well-studied problem in document analysis, and many academic and commercial approaches have been developed to recognize tables in several document formats, including plain text, scanned page images, and born-digital, object-based formats such as PDF. There are several works that can convert tables in text-based PDF format into structured representations. However, there is limited work on image-based table content recognition. The proposed challenge aims at assessing the ability of state-of-the-art methods to recognize scientific tables in LaTeX format. Our shared task has two subtasks.
Dosovitskiy, Alexey, Beyer, Lucas, Kolesnikov, Alexander, Weissenborn, Dirk, Zhai, Xiaohua, Unterthiner, Thomas, Dehghani, Mostafa, Minderer, Matthias, Heigold, Georg, Gelly, Sylvain, Uszkoreit, Jakob, Houlsby, Neil
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. Self-attention-based architectures, in particular Transformers (Vaswani et al., 2017), have become the model of choice in natural language processing (NLP). The dominant approach is to pre-train on a large text corpus and then fine-tune on a smaller task-specific dataset (Devlin et al., 2019). Thanks to Transformers' computational efficiency and scalability, it has become possible to train models of unprecedented size, with over 100B parameters. With the models and datasets growing, there is still no sign of saturating performance. In computer vision, however, convolutional architectures remain dominant (LeCun et al., 1989; Krizhevsky et al., 2012; He et al., 2016). Inspired by NLP successes, multiple works try combining CNN-like architectures with self-attention (Wang et al., 2018; Carion et al., 2020), some replacing the convolutions entirely (Ramachandran et al., 2019; Wang et al., 2020a). The latter models, while theoretically efficient, have not yet been scaled effectively on modern hardware accelerators due to the use of specialized attention patterns. Therefore, in large-scale image recognition, classic ResNetlike architectures are still state of the art (Mahajan et al., 2018; Xie et al., 2020; Kolesnikov et al., 2020). Inspired by the Transformer scaling successes in NLP, we experiment with applying a standard Transformer directly to images, with the fewest possible modifications. To do so, we split an image into patches and provide the sequence of linear embeddings of these patches as an input to a Transformer.
The system uses a GoPro camera mounted on top of a hard hat. When managers tour a site once or twice a week, the camera on their head captures video footage of the whole project and uploads it to image recognition software, which compares the status of many thousands of objects on site--such as electrical sockets and bathroom fittings--with a digital replica of the building. The AI also uses the video feed to track where the camera is in the building to within a few centimeters so that it can identify the exact location of the objects in each frame. The system can track the status of around 150,000 objects several times a week, says Danon. For each object the AI can tell which of three or four states it is in, from not yet begun to fully installed.