Goto

Collaborating Authors

 Pattern Recognition


PowerToys update adds OCR and two more free tools

PCWorld

If you use Windows, you want PowerToys. This collection of open-source goodies, guided and published by Microsoft itself, is one of the best free software packages out there, and we can't recommend it enough. That only becomes more true today, as the company publishes an updated version with three brand new tools: the previously-spotted Text Extrator (an Optical Character Recognition tool), a ruler for measuring pixels on your screen, and a tool for quickly inserting little-used accents into text. Text Extractor is probably the most universally-applicable addition here. It's an open-source version of Joseph Finney's paid Text Grab app, now integrated into PowerToys and free for Windows users.


TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

arXiv.org Artificial Intelligence

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at \url{https://aka.ms/trocr}.


Beyond Object Identification: A Giant-Leap into Pattern Discovery in Imagery Data

#artificialintelligence

A critical question that arises after identifying the objects (or class labels) in an imagery database is: "How are the various objects discovered in an imagery database correlated with one another?" This article tries to answer this question by providing a generic framework that can facilitate the readers to discover hidden correlations between objects in the imagery database. The portion of this article is drawn from our work published in IEEE BIGDATA 2021 [1].) The framework to discover the correlation between the objects in an imagery database is shown in Figure 1. Demonstration: In this demo, we first pass the image data into a trained model (e.g., resnet50) and extract objects and their scores.


Overview of Machine Learning

#artificialintelligence

In layman's terms, machine learning is to allow computers to learn automatically from data to obtain certain knowledge. As a discipline, machine learning usually refers to a type of problem and the method to solve this type of problem, that is, how to find the law from the observation data, and use the learned law to predict the unknown or unobservable data. In the early engineering field, machine learning is often called pattern recognition, but pattern recognition is more biased towards specific application tasks, such as optical character recognition, speech recognition, and face recognition. The characteristic of these tasks is that for us humans, these tasks are easy to complete, but we do not know how we do it, so it is difficult to manually design a computer program to complete these tasks. A feasible method is to design an algorithm that allows the computer to learn the rules from the labeled samples and use it to complete various recognition tasks. With the increasing application of machine learning technology, the concept of machine learning is now gradually replacing pattern recognition, becoming the general term for this type of problem and its solutions. Taking handwritten digit recognition as an example, we need to allow the computer to automatically recognize handwritten digits. Handwritten digit recognition is a classic machine learning task, which is simple for humans, but very difficult for computers. It is difficult for us to summarize the handwriting characteristics of each digit, or the rules for distinguishing different digits, so designing a set of recognition algorithms is an almost impossible task. In real life, many problems are similar to those of handwritten number recognition, such as object recognition and speech recognition. For this kind of problem, we don't know how to design a computer program to solve it. Even if it can be realized by some heuristic rules, the process is extremely complicated. Therefore, people began to try another way of thinking, that is, let the computer see a large number of samples, and learn some experience from them, and then use these experiences to identify new samples. To recognize handwritten digits, first manually annotate a large number of handwritten digital images (that is, each image is manually marked with what number it is), these images are used as training data, and then a set of models are automatically generated through the learning algorithm, and rely on it. This method of learning through data is called the method of machine learning. First, we use a life example to introduce some basic concepts in machine learning: samples, features, labels, models, learning algorithms, etc. Suppose we want to buy mangoes in the market, but we have no previous experience in selecting mangoes, how can we obtain this knowledge through learning? First, we randomly select some mangoes from the market and list the characteristics of each mango.


SL Sensor: An Open-Source, ROS-Based, Real-Time Structured Light Sensor for High Accuracy Construction Robotic Applications

arXiv.org Artificial Intelligence

High accuracy 3D surface information is required for many construction robotics tasks such as automated cement polishing or robotic plaster spraying. However, consumer-grade depth cameras currently found in the market are not accurate enough for these tasks where millimeter (mm)-level accuracy is required. This paper presents SL Sensor, a structured light sensing solution capable of producing high fidelity point clouds at 5 Hz by leveraging on phase shifting profilometry (PSP) codification techniques. The SL Sensor was compared with to two commercial depth cameras - the Azure Kinect and RealSense L515. Experiments showed that the SL Sensor surpasses the two devices in both precision and accuracy for indoor surface reconstruction applications. Furthermore, to demonstrate SL Sensor's ability to be a structured light sensing research platform for robotic applications, a motion compensation strategy was developed that allows the SL Sensor to operate during linear motion when traditional PSP methods only work when the sensor is static. Field experiments show that the SL Sensor is able to produce highly detailed reconstructions of spray plastered surfaces. The robot operating system (ROS)-based software and a sample hardware build of the SL Sensor are made open-source with the objective to make structured light sensing more accessible to the construction robotics community. All documentation and code is available at https://github.com/ethz-asl/sl_sensor/ .


A Black-Box Attack on Optical Character Recognition Systems

arXiv.org Artificial Intelligence

Adversarial machine learning is an emerging area showing the vulnerability of deep learning models. Exploring attack methods to challenge state of the art artificial intelligence (A.I.) models is an area of critical concern. The reliability and robustness of such A.I. models are one of the major concerns with an increasing number of effective adversarial attack methods. Classification tasks are a major vulnerable area for adversarial attacks. The majority of attack strategies are developed for colored or gray-scaled images. Consequently, adversarial attacks on binary image recognition systems have not been sufficiently studied. Binary images are simple two possible pixel-valued signals with a single channel. The simplicity of binary images has a significant advantage compared to colored and gray scaled images, namely computation efficiency. Moreover, most optical character recognition systems (O.C.R.s), such as handwritten character recognition, plate number identification, and bank check recognition systems, use binary images or binarization in their processing steps. In this paper, we propose a simple yet efficient attack method, Efficient Combinatorial Black-box Adversarial Attack, on binary image classifiers. We validate the efficiency of the attack technique on two different data sets and three classification networks, demonstrating its performance. Furthermore, we compare our proposed method with state-of-the-art methods regarding advantages and disadvantages as well as applicability.


GaitFi: Robust Device-Free Human Identification via WiFi and Vision Multimodal Learning

arXiv.org Artificial Intelligence

As an important biomarker for human identification, human gait can be collected at a distance by passive sensors without subject cooperation, which plays an essential role in crime prevention, security detection and other human identification applications. At present, most research works are based on cameras and computer vision techniques to perform gait recognition. However, vision-based methods are not reliable when confronting poor illuminations, leading to degrading performances. In this paper, we propose a novel multimodal gait recognition method, namely GaitFi, which leverages WiFi signals and videos for human identification. In GaitFi, Channel State Information (CSI) that reflects the multi-path propagation of WiFi is collected to capture human gaits, while videos are captured by cameras. To learn robust gait information, we propose a Lightweight Residual Convolution Network (LRCN) as the backbone network, and further propose the two-stream GaitFi by integrating WiFi and vision features for the gait retrieval task. The GaitFi is trained by the triplet loss and classification loss on different levels of features. Extensive experiments are conducted in the real world, which demonstrates that the GaitFi outperforms state-of-the-art gait recognition methods based on single WiFi or camera, achieving 94.2% for human identification tasks of 12 subjects.


OCR is getting super cool for Businesses

#artificialintelligence

A Few months back, the student in class captured the image of the notes made by the other student in front of him and used iOS 15's recent text-recognition feature to highlight text, and copy and paste it into his notes. This instance was tweeted by @juanbuis, who shared the video of a student making the most of iOS 15's Live Text OCR feature. This cool OCR or Optical Character Recognition feature that the above student opts for is generally used to pull up the information from the text or documents and then convert it into the machine's language. Recently, the popular app developer Alessandro Paluzzi has also seen that Twitter is working on an OCR (optical character recognition) feature for the description of alt text. In his tweet, Alessandro Paluzzi shared the demonstration of how this twitter feature will function through a short video. At Dwarf AI we too want to make this super cool technology to be easily accessible by other businesses.


Microsoft is teaching computers to understand cause and effect

#artificialintelligence

AI that analyzes data to help you make decisions is set to be an increasingly big part of business tools, and the systems that do that are getting smarter with a new approach to decision optimization that Microsoft is starting to make available. Machine learning is great at extracting patterns out of large amounts of data but not necessarily good at understanding those patterns, especially in terms of what causes them. A machine learning system might learn that people buy more ice cream in hot weather, but without a common sense understanding of the world, it's just as likely to suggest that if you want the weather to get warmer then you should buy more ice cream. Understanding why things happen helps humans make better decisions, like a doctor picking the best treatment or a business team looking at the results of AB testing to decide which price and packaging will sell more products. There are machine learning systems that deal with causality, but so far this has mostly been restricted to research that focuses on small-scale problems rather than practical, real-world systems because it's been hard to do. Deep learning, which is widely used for machine learning, needs a lot of training data, but humans can gather information and draw conclusions much more efficiently by asking questions, like a doctor asking about your symptoms, a teacher giving students a quiz, a financial advisor understanding whether a low risk or high risk investment is best for you, or a salesperson getting you to talk about what you need from a new car.


Unsupervised diffeomorphic cardiac image registration using parameterization of the deformation field

arXiv.org Artificial Intelligence

This study proposes an end-to-end unsupervised diffeomorphic deformable registration framework based on moving mesh parameterization. Using this parameterization, a deformation field can be modeled with its transformation Jacobian determinant and curl of end velocity field. The new model of the deformation field has three important advantages; firstly, it relaxes the need for an explicit regularization term and the corresponding weight in the cost function. The smoothness is implicitly embedded in the solution which results in a physically plausible deformation field. Secondly, it guarantees diffeomorphism through explicit constraints applied to the transformation Jacobian determinant to keep it positive. Finally, it is suitable for cardiac data processing, since the nature of this parameterization is to define the deformation field in terms of the radial and rotational components. The effectiveness of the algorithm is investigated by evaluating the proposed method on three different data sets including 2D and 3D cardiac MRI scans. The results demonstrate that the proposed framework outperforms existing learning-based and non-learning-based methods while generating diffeomorphic transformations.