Goto

Collaborating Authors

 Pattern Recognition


Lost in Algorithms

arXiv.org Artificial Intelligence

Algorithms are becoming more capable, and with that comes hic sunt dracones (here be dragons). The term symbolizes areas beyond our known maps. We use this term since we are stepping into an exciting, potentially dangerous, and unknown area with algorithms. Our curiosity to understand the natural world drives our search for new methods. For this reason, it is crucial to explore this subject. The project's objective is to overlay the information obtained, in conjunction with the state of hardware today, to see if we can determine the likely directions for future algorithms'. Even though we slightly cover non-classical computing in this paper, our primary focus is on classical computing (i.e., digital computers). It is worth noting that non-classical quantum computing requires classical computers to operate; they are not mutually exclusive.


HUSP-SP: Faster Utility Mining on Sequence Data

arXiv.org Artificial Intelligence

High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.


A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

arXiv.org Artificial Intelligence

This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.


The Complete 2022 Android Machine Learning Course The Complete 2022 Android Machine Learning Course

#artificialintelligence

Welcome to The Complete 2021 Android Machine Learning Course. In this course, you will learn the use of Machine learning in Android along with training your own image recognition models for Android applications without knowing any background knowledge of machine learning. The course is designed in such a manner that you don't need any prior knowledge of machine learning to it. In modern world app development, the use of ML in mobile app development is compulsory. We hardly see an application in which ML is not being used.


Towards mapping the contemporary art world with ArtLM: an art-specific NLP model

arXiv.org Artificial Intelligence

With an increasing amount of data in the art world, discovering artists and artworks suitable to collectors' tastes becomes a challenge. It is no longer enough to use visual information, as contextual information about the artist has become just as important in contemporary art. In this work, we present a generic Natural Language Processing framework (called ArtLM) to discover the connections among contemporary artists based on their biographies. In this approach, we first continue to pre-train the existing general English language models with a large amount of unlabelled art-related data. We then fine-tune this new pre-trained model with our biography pair dataset manually annotated by a team of professionals in the art industry. With extensive experiments, we demonstrate that our ArtLM achieves 85.6% accuracy and 84.0% F1 score and outperforms other baseline models. We also provide a visualisation and a qualitative analysis of the artist network built from ArtLM's outputs.


Bengali Handwritten Digit Recognition using CNN with Explainable AI

arXiv.org Artificial Intelligence

Handwritten character recognition is a hot topic for research nowadays. If we can convert a handwritten piece of paper into a text-searchable document using the Optical Character Recognition (OCR) technique, we can easily understand the content and do not need to read the handwritten document. OCR in the English language is very common, but in the Bengali language, it is very hard to find a good quality OCR application. If we can merge machine learning and deep learning with OCR, it could be a huge contribution to this field. Various researchers have proposed a number of strategies for recognizing Bengali handwritten characters. A lot of ML algorithms and deep neural networks were used in their work, but the explanations of their models are not available. In our work, we have used various machine learning algorithms and CNN to recognize handwritten Bengali digits. We have got acceptable accuracy from some ML models, and CNN has given us great testing accuracy. Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model and helped us detect the origin of interest for recognizing a digit from an image.



A Projected Upper Bound for Mining High Utility Patterns from Interval-Based Event Sequences

arXiv.org Artificial Intelligence

High utility pattern mining is an interesting yet challenging problem. The intrinsic computational cost of the problem will impose further challenges if efficiency in addition to the efficacy of a solution is sought. Recently, this problem was studied on interval-based event sequences with a constraint on the length and size of the patterns. However, the proposed solution lacks adequate efficiency. To address this issue, we propose a projected upper bound on the utility of the patterns discovered from sequences of interval-based events. To show its effectiveness, the upper bound is utilized by a pruning strategy employed by the HUIP-Miner algorithm. Experimental results show that the new upper bound improves HUIPMiner performance in terms of both execution time and memory usage.


Pattern Recognition Experiments on Mathematical Expressions

arXiv.org Artificial Intelligence

We provide the results of pattern recognition experiments on mathematical expressions. We give a few examples of conjectured results. None of which was thoroughly checked for novelty. We did not attempt to prove all the relations found and focused on their generation.


Towards Sequence Utility Maximization under Utility Occupancy Measure

arXiv.org Artificial Intelligence

The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the measure of utility is often used to demonstrate the importance, profit, or risk of an object or a pattern. In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to the neglect of utility sharing. This leads to the derived patterns only exploring partial and local knowledge from a database. Utility occupancy is a recently proposed model that considers the problem of mining with high utility but low occupancy. However, existing studies are concentrated on itemsets that do not reveal the temporal relationship of object occurrences. Therefore, this paper towards sequence utility maximization. We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining (HUOSPM). Three dimensions, including frequency, utility, and occupancy, are comprehensively evaluated in HUOSPM. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed. Furthermore, two data structures for storing related information about a pattern, Utility-Occupancy-List-Chain (UOL-Chain) and Utility-Occupancy-Table (UO-Table) with six associated upper bounds, are designed to improve efficiency. Empirical experiments are carried out to evaluate the novel algorithm's efficiency and effectiveness. The influence of different upper bounds and pruning strategies is analyzed and discussed. The comprehensive results suggest that the work of our algorithm is intelligent and effective.