Pattern Recognition
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
Jaume, Guillaume, Ekenel, Hazim Kemal, Thiran, Jean-Philippe
In this paper, we present a new dataset for Form Understanding in Noisy Scanned Documents (FUNSD). Form Understanding (FoUn) aims at extracting and structuring the textual content of forms. The dataset comprises 200 fully annotated real scanned forms. The documents are noisy and exhibit large variabilities in their representation making FoUn a challenging task. The proposed dataset can be used for various tasks including text detection, optical character recognition (OCR), spatial layout analysis and entity labeling/linking. To the best of our knowledge this is the first publicly available dataset with comprehensive annotations addressing the FoUn task. We also present a set of baselines and introduce metrics to evaluate performance on the FUNSD dataset. The FUNSD dataset can be downloaded at https://guillaumejaume.github. io/FUNSD/.
Transcribing Content from Structural Images with Spotlight Mechanism
Yin, Yu, Huang, Zhenya, Chen, Enhong, Liu, Qi, Zhang, Fuzheng, Xie, Xing, Hu, Guoping
Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.
The Convolutional Tsetlin Machine
Granmo, Ole-Christoffer, Glimsdal, Sondre, Jiao, Lei, Goodwin, Morten, Omlin, Christian W., Berge, Geir Thore
Deep neural networks have obtained astounding successes for important pattern recognition tasks, but they suffer from high computational complexity and the lack of interpretability. The recent Tsetlin Machine (TM) attempts to address this lack by using easy-to-interpret conjunctive clauses in propositional logic to solve complex pattern recognition problems. The TM provides competitive accuracy in several benchmarks, while keeping the important property of interpretability. It further facilitates hardware-near implementation since inputs, patterns, and outputs are expressed as bits, while recognition and learning rely on straightforward bit manipulation. In this paper, we exploit the TM paradigm by introducing the Convolutional Tsetlin Machine (CTM), as an interpretable alternative to convolutional neural networks (CNNs). Whereas the TM categorizes an image by employing each clause once to the whole image, the CTM uses each clause as a convolution filter. That is, a clause is evaluated multiple times, once per image patch taking part in the convolution. To make the clauses location-aware, each patch is further augmented with its coordinates within the image. The output of a convolution clause is obtained simply by ORing the outcome of evaluating the clause on each patch. In the learning phase of the TM, clauses that evaluate to 1 are contrasted against the input. For the CTM, we instead contrast against one of the patches, randomly selected among the patches that made the clause evaluate to 1. Accordingly, the standard Type I and Type II feedback of the classic TM can be employed directly, without further modification. The CTM obtains a peak test accuracy of 99.51% on MNIST, 96.21% on Kuzushiji-MNIST, 89.56% on Fashion-MNIST, and 100.0% on the 2D Noisy XOR Problem, which is competitive with results reported for simple 4-layer CNNs, BinaryConnect, and a recent FPGA-accelerated Binary CNN.
Artificial Intelligence in Health Care--Will the Value Match the Hype?
Artificial intelligence (AI) and its many related applications (ie, big data, deep analytics, machine learning) have entered medicine's "magic bullet" phase. Desperate for a solution for the never-ending challenges of cost, quality, equity, and access, a steady stream of books, articles, and corporate pronouncements makes it seem like health care is on the cusp of an "AI revolution," one that will finally result in high-value care. While AI has been responsible for some stunning advances, particularly in the area of visual pattern recognition,1-3 a major challenge will be in converting AI-derived predictions or recommendations into effective action. The most pressing problem with the US health care system is not a lack of data or analytics but changing the behavior of millions of patients and clinicians. Physician behaviors, including ordering tests, procedures, pharmaceuticals, and other treatments, are responsible for 80% of health care costs.
A new era: artificial intelligence and machine learning in prostate cancer
The current availability of ever-increasing computational power, highly developed pattern recognition algorithms and advanced image processing software working at very high speeds has led to the emergence of computer-based systems that are trained to perform complex tasks in bioinformatics, medical imaging and medical robotics. Accessibility to'big data' enables the'cognitive' computer to scan billions of bits of unstructured information, extract the relevant information and recognize complex patterns with increasing confidence. Computer-based decision-support systems based on machine learning (ML) have the potential to revolutionize medicine by performing complex tasks that are currently assigned to specialists to improve diagnostic accuracy, increase efficiency of throughputs, improve clinical workflow, decrease human resource costs and improve treatment choices. These characteristics could be especially helpful in the management of prostate cancer, with growing applications in diagnostic imaging, surgical interventions, skills training and assessment, digital pathology and genomics. Medicine must adapt to this changing world, and urologists, oncologists, radiologists and pathologists, as high-volume users of imaging and pathology, need to understand this burgeoning science and acknowledge that the development of highly accurate AI-based decision-support applications of ML will require collaboration between data scientists, computer researchers and engineers.
Quantitative Error Prediction of Medical Image Registration using Regression Forests
Sokooti, Hessam, Saygili, Gorkem, Glocker, Ben, Lelieveldt, Boudewijn P. F., Staring, Marius
Predicting registration error can be useful for evaluation of registration procedures, which is important for the adoption of registration techniques in the clinic. In addition, quantitative error prediction can be helpful in improving the registration quality. The task of predicting registration error is demanding due to the lack of a ground truth in medical images. This paper proposes a new automatic method to predict the registration error in a quantitative manner, and is applied to chest CT scans. A random regression forest is utilized to predict the registration error locally. The forest is built with features related to the transformation model and features related to the dissimilarity after registration. The forest is trained and tested using manually annotated corresponding points between pairs of chest CT scans in two experiments: SPREAD (trained and tested on SPREAD) and inter-database (including three databases SPREAD, DIR-Lab-4DCT and DIR-Lab-COPDgene). The results show that the mean absolute errors of regression are 1.07 $\pm$ 1.86 and 1.76 $\pm$ 2.59 mm for the SPREAD and inter-database experiment, respectively. The overall accuracy of classification in three classes (correct, poor and wrong registration) is 90.7% and 75.4%, for SPREAD and inter-database respectively. The good performance of the proposed method enables important applications such as automatic quality control in large-scale image analysis.
Reference-Based Sequence Classification
He, Zengyou, Xu, Guangyao, Sheng, Chaohua, Xu, Bo, Zou, Quan
Sequence classification is an important data mining task in many real world applications. Over the past few decades, many sequence classification methods have been proposed from different aspects. In particular, the pattern-based method is one of the most important and widely studied sequence classification methods in the literature. In this paper, we present a reference-based sequence classification framework, which can unify existing pattern-based sequence classification methods under the same umbrella. More importantly, this framework can be used as a general platform for developing new sequence classification algorithms. By utilizing this framework as a tool, we propose new sequence classification algorithms that are quite different from existing solutions. Experimental results show that new methods developed under the proposed framework are capable of achieving comparable classification accuracy to those state-of-the-art sequence classification algorithms.
Discovering Suspicious Patterns Using a Graph Based Approach
Velampalli, Sirisha (C.R.Rao Advanced Institute of Mathematics, Statistics and Computer Science) | Mookiah, Lenin (Tennessee Technological University) | Eberle, William (Tennessee Technological University)
Recently, there has been much attention on tools and techniques for visualizing and acquiring new knowledge and insights. In the VAST 2018 competition, one of the challenges is to discover the fraudulent group of employees at Kasios, a furniture manufacturing company. In this work, we use a graph-based approach that analyzes the data for suspicious employee activities at Kasios. Graph based approaches enable one to handle rich contextual data and provide a deeper understanding of data due to the ability to discover patterns in databases that are not easily found using traditional query or statistical tools. We focus on graph based knowledge discovery in structural data to mine for interesting patterns and anomalies. Our approach first reports the normative patterns in the data, and then discovers any anomalous patterns associated with the previously discovered patterns. For visualizing the suspicious patterns, we also use the enterprise graph database Neo4j. Neo4j Browser provides a way to visualize graph structures.
Winograd Convolution for DNNs: Beyond linear polinomials
Barabasz, Barbara, Gregg, David
We investigated a wider range of Winograd family convolution algorithms for Deep Neural Network. We presented the explicit Winograd convolution algorithm in general case (used the polynomials of the degrees higher than one). It allows us to construct more different versions in the aspect of performance than commonly used Winograd convolution algorithms and improve the accuracy and performance of convolution computations. We found that in $fp16$ this approach gives us better accuracy of image recognition while keeps the same number of general multiplications computed per single output point as the commonly used Winograd algorithm for a kernel of the size $3 \times 3$ and output size equal to $4 \times 4$. We demonstrated that in $bf16$ it is possible to perform the convolution computation faster keeping the accuracy of image recognition the same as for direct convolution method. We tested our approach for a subset of $2000$ images from Imaginet validation set. We present the results for three precision of computations $fp32$, $fp16$ and $bf16$.
The Regression Tsetlin Machine: A Tsetlin Machine for Continuous Output Problems
Abeyrathna, K. Darshana, Granmo, Ole-Christoffer, Jiao, Lei, Goodwin, Morten
The recently introduced Tsetlin Machine (TM) has provided competitive pattern classification accuracy in several benchmarks, composing patterns with easy-to-interpret conjunctive clauses in propositional logic. In this paper, we go beyond pattern classification by introducing a new type of TMs, namely, the Regression Tsetlin Machine (RTM). In all brevity, we modify the inner inference mechanism of the TM so that input patterns are transformed into a single continuous output, rather than to distinct categories. We achieve this by: (1) using the conjunctive clauses of the TM to capture arbitrarily complex patterns; (2) mapping these patterns to a continuous output through a novel voting and normalization mechanism; and (3) employing a feedback scheme that updates the TM clauses to minimize the regression error. The feedback scheme uses a new activation probability function that stabilizes the updating of clauses, while the overall system converges towards an accurate input-output mapping. The performance of the proposed approach is evaluated using six different artificial datasets with and without noise. The performance of the RTM is compared with the Classical Tsetlin Machine (CTM) and the Multiclass Tsetlin Machine (MTM). Our empirical results indicate that the RTM obtains the best training and testing results for both noisy and noise-free datasets, with a smaller number of clauses.