Pattern Recognition
Breakthrough in energy efficient artificial intelligence
Thanks to a mathematical breakthrough, AI applications like speech recognition, gesture recognition and ECG classification can become a hundred to a thousand times more energy efficient. This means it will be possible to put much more elaborate AI in chips, enabling applications to run on a smartphone or smartwatch where before this was done in the cloud. Running the AI on local devices makes the applications more robust and privacy-friendly--robust, because a network connection with the cloud is no longer necessary. And more privacy friendly because data can be stored and processed locally. The mathematical breakthrough has been achieved by researchers of Centrum Wiskunde & Informatica (CWI), the Dutch national research center for mathematics and computer science, together with the IMEC/Holst Research Center from Eindhoven, The Netherlands.
Multi-label classification of promotions in digital leaflets using textual and visual information
Arroyo, Roberto, Jimรฉnez-Cabello, David, Martรญnez-Cebriรกn, Javier
Product descriptions in e-commerce platforms contain detailed and valuable information about retailers assortment. In particular, coding promotions within digital leaflets are of great interest in e-commerce as they capture the attention of consumers by showing regular promotions for different products. However, this information is embedded into images, making it difficult to extract and process for downstream tasks. In this paper, we present an end-to-end approach that classifies promotions within digital leaflets into their corresponding product categories using both visual and textual information. Our approach can be divided into three key components: 1) region detection, 2) text recognition and 3) text classification. In many cases, a single promotion refers to multiple product categories, so we introduce a multi-label objective in the classification head. We demonstrate the effectiveness of our approach for two separated tasks: 1) image-based detection of the descriptions for each individual promotion and 2) multi-label classification of the product categories using the text from the product descriptions. We train and evaluate our models using a private dataset composed of images from digital leaflets obtained by Nielsen. Results show that we consistently outperform the proposed baseline by a large margin in all the experiments.
Quantum-enhanced barcode decoding and pattern recognition
Banchi, Leonardo, Zhuang, Quntao, Pirandola, Stefano
Quantum hypothesis testing is one of the most fundamental problems in quantum information theory, with crucial implications in areas like quantum sensing, where it has been used to prove quantum advantage in a series of binary photonic protocols, e.g., for target detection or memory cell readout. In this work, we generalize this theoretical model to the multi-partite setting of barcode decoding and pattern recognition. We start by defining a digital image as an array or grid of pixels, each pixel corresponding to an ensemble of quantum channels. Specializing each pixel to a black and white alphabet, we naturally define an optical model of barcode. In this scenario, we show that the use of quantum entangled sources, combined with suitable measurements and data processing, greatly outperforms classical coherent-state strategies for the tasks of barcode data decoding and classification of black and white patterns. Moreover, introducing relevant bounds, we show that the problem of pattern recognition is significantly simpler than barcode decoding, as long as the minimum Hamming distance between images from different classes is large enough. Finally, we theoretically demonstrate the advantage of using quantum sensors for pattern recognition with the nearest neighbor classifier, a supervised learning algorithm, and numerically verify this prediction for handwritten digit classification.
Discovery data topology with the closure structure. Theoretical and practical aspects
Makhalova, Tatiana, Kuznetsov, Sergei O., Napoli, Amedeo
In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a summarization of a dataset based on a set of patterns does not provide a general and satisfying view over a dataset, we introduce a concise representation --the closure structure-- based on closed itemsets and their minimum generators, for capturing the intrinsic content of a dataset. The closure structure allows one to understand the topology of the dataset in the whole and the inherent complexity of the data. We propose a formalization of the closure structure in terms of Formal Concept Analysis, which is well adapted to study this data topology. We present and demonstrate theoretical results, and as well, practical results using the GDPM algorithm. GDPM is rather unique in its functionality as it returns a characterization of the topology of a dataset in terms of complexity levels, highlighting the diversity and the distribution of the itemsets. Finally, a series of experiments shows how GDPM can be practically used and what can be expected from the output.
Power Plant 4.0: Embracing next-generation technologies for power plant digitization
Even before the outbreak of COVID-19, fossil-fuel power plants faced significant disruption from renewable energy sources, low gas prices, and ambitious decarbonization goals, all of which are changing customer preferences. Now, as the power-generation industry shifts to the next normal, adopting the latest digital and advanced-analytics technologies has become critical. Many power companies began their digital transformations with technological solutions such as data models, which help optimize set points, enable better dispatch decisions, and support maintenance strategies and operating-mode selection. Forward-thinking companies, however, have recently started using visualization tools to manage real-time generation performance and digital control software to relay predictive data to control rooms. Yet these innovations are grounded in tangibly improving outcomes for plant operations and are therefore only part of a digitally enabled, next-generation power plant (Exhibit 1).
A "Hello World" Into Image Recognition with MNIST
To begin, we'll load the library Keras and other necessary inputs: Next, we'll load the MNIST dataset and split it into X train, X test, Y train, and Y test data: Next, we can outline some important variables for image loading and training. The data needs to be converted to a 32-bit float and standardized. Now that the data is ready, we can define the model architecture. After the model architecture has been defined, it must be compiled. Compiling a model outlines the loss function, optimizer, and metrics.
Community detection, pattern recognition, and hypergraph-based learning: approaches using metric geometry and persistent homology
Nguyen, Dong Quan Ngoc, Xing, Lin, Lin, Lizhen
Hypergraph data appear and are hidden in many places in the modern age. They are data structure that can be used to model many real data examples since their structures contain information about higher order relations among data points. One of the main contributions of our paper is to introduce a new topological structure to hypergraph data which bears a resemblance to a usual metric space structure. Using this new topological space structure of hypergraph data, we propose several approaches to study community detection problem, detecting persistent features arising from homological structure of hypergraph data. Also based on the topological space structure of hypergraph data introduced in our paper, we introduce a modified nearest neighbors methods which is a generalization of the classical nearest neighbors methods from machine learning. Our modified nearest neighbors methods have an advantage of being very flexible and applicable even for discrete structures as in hypergraphs. We then apply our modified nearest neighbors methods to study sign prediction problem in hypegraph data constructed using our method.
CHIRPS: Explaining random forest classification
Modern machine learning methods typically produce "black box" models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification.
NUST Machine Learning Domain โ Technology Times โ IAM Network
Machine Learning is the talk of the town these days. Conventional processes which used to be digitally transformed by IT solutions are now even more fast-tracked thanks to the advancement in Artificial Intelligence. With extensive research experience and knowledge in the domain of machine learning and pattern recognition, Dr. Faisal Shafait is highly regarded as a teacher and researcher at NUST. In fact, it is his brilliant contributions to document image analysis and computational forensics that have seen him secure the IAPR award and make history. Dr. Faisal is the Director of TUKL R&D Center at NUST-SEECS which conducts research in AI/ML and linked domains.
Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition
Li, Bingcong, Tang, Xin, Qi, Xianbiao, Chen, Yihao, Xiao, Rong
Recently, inspired by Transformer, self-attention-based scene text recognition approaches have achieved outstanding performance. However, we find that the size of model expands rapidly with the lexicon increasing. Specifically, the number of parameters for softmax classification layer and output embedding layer are proportional to the vocabulary size. It hinders the development of a lightweight text recognition model especially applied for Chinese and multiple languages. Thus, we propose a lightweight scene text recognition model named Hamming OCR. In this model, a novel Hamming classifier, which adopts locality sensitive hashing (LSH) algorithm to encode each character, is proposed to replace the softmax regression and the generated LSH code is directly employed to replace the output embedding. We also present a simplified transformer decoder to reduce the number of parameters by removing the feed-forward network and using cross-layer parameter sharing technique. Compared with traditional methods, the number of parameters in both classification and embedding layers is independent on the size of vocabulary, which significantly reduces the storage requirement without loss of accuracy. Experimental results on several datasets, including four public benchmaks and a Chinese text dataset synthesized by SynthText with more than 20,000 characters, shows that Hamming OCR achieves competitive results.