Goto

Collaborating Authors

 Pattern Recognition


Conversational Pattern Mining using Motif Detection

arXiv.org Artificial Intelligence

The subject of conversational mining has become of great interest recently due to the explosion of social and other online media. Supplementing this explosion of text is the advancement in pre-trained language models which have helped us to leverage these sources of information. An interesting domain to analyse is conversations in terms of complexity and value. Complexity arises due to the fact that a conversation can be asynchronous and can involve multiple parties. It is also computationally intensive to process. We use unsupervised methods in our work in order to develop a conversational pattern mining technique which does not require time consuming, knowledge demanding and resource intensive labelling exercises. The task of identifying repeating patterns in sequences is well researched in the Bioinformatics field. In our work, we adapt this to the field of Natural Language Processing and make several extensions to a motif detection algorithm. In order to demonstrate the application of the algorithm on a dynamic, real world data set; we extract motifs from an open-source film script data source. We run an exploratory investigation into the types of motifs we are able to mine.


UIT-HWDB: Using Transferring Method to Construct A Novel Benchmark for Evaluating Unconstrained Handwriting Image Recognition in Vietnamese

arXiv.org Artificial Intelligence

Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese.


Multiresolution Dual-Polynomial Decomposition Approach for Optimized Characterization of Motor Intent in Myoelectric Control Systems

arXiv.org Artificial Intelligence

Surface electromyogram (sEMG) is arguably the most sought-after physiological signal with a broad spectrum of biomedical applications, especially in miniaturized rehabilitation robots such as multifunctional prostheses. The widespread use of sEMG to drive pattern recognition (PR)-based control schemes is primarily due to its rich motor information content and non-invasiveness. Moreover, sEMG recordings exhibit non-linear and non-uniformity properties with inevitable interferences that distort intrinsic characteristics of the signal, precluding existing signal processing methods from yielding requisite motor control information. Therefore, we propose a multiresolution decomposition driven by dual-polynomial interpolation (MRDPI) technique for adequate denoising and reconstruction of multi-class EMG signals to guarantee the dual-advantage of enhanced signal quality and motor information preservation. Parameters for optimal MRDPI configuration were constructed across combinations of thresholding estimation schemes and signal resolution levels using EMG datasets of amputees who performed up to 22 predefined upper-limb motions acquired in-house and from the public NinaPro database. Experimental results showed that the proposed method yielded signals that led to consistent and significantly better decoding performance for all metrics compared to existing methods across features, classifiers, and datasets, offering a potential solution for practical deployment of intuitive EMG-PR-based control schemes for multifunctional prostheses and other miniaturized rehabilitation robotic systems that utilize myoelectric signals as control inputs.


Automatic speaker recognition technology outperforms human listeners in the courtroom

#artificialintelligence

A key question in a number of court cases is whether a speaker on an audio recording is a particular known speaker, for example, whether a speaker on a recording of an intercepted telephone call is the defendant. In most English-speaking countries, expert testimony is only admissible in a court of law if it will potentially assist the judge or the jury to make a decision. If the judge or the jury's speaker identification were equally accurate or more accurate than a forensic scientist's forensic voice comparison, then the forensic-voice-comparison testimony would not be admissible. In a research paper published in the journal Forensic Science International, a multidisciplinary international team of researchers has reported the first set of results from a comprehensive study that compares the accuracy of speaker-identification by individual listeners (like judges or jury members) with the accuracy of a forensic-voice-comparison system that is based on state-of-the-art automatic-speaker-recognition technology, and that does so using recordings that reflect the conditions of an actual case. The questioned-speaker recording was of a telephone call with background office noise, and the known-speaker recording was of a police interview conducted in echoey room with background ventilation-system noise.


Minimalist Data Wrangling with Python

arXiv.org Artificial Intelligence

Minimalist Data Wrangling with Python is envisaged as a student's first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different sources, transforming, selecting, and extracting features, performing exploratory data analysis and dimensionality reduction, identifying naturally occurring data clusters, modelling patterns in data, comparing data between groups, and reporting the results. This textbook is a non-profit project. Its online and PDF versions are freely available at https://datawranglingpy.gagolewski.com/.


Unsupervised dynamic modeling of medical image transformation

arXiv.org Artificial Intelligence

Spatiotemporal imaging has applications in e.g. cardiac diagnostics, surgical guidance, and radiotherapy monitoring, In this paper, we explain the temporal motion by identifying the underlying dynamics, only based on the sequential images. Our dynamical model maps the inputs of observed high-dimensional sequential images to a low-dimensional latent space wherein a linear relationship between a hidden state process and the lower-dimensional representation of the inputs holds. For this, we use a conditional variational auto-encoder (CVAE) to nonlinearly map the higher-dimensional image to a lower-dimensional space, wherein we model the dynamics with a linear Gaussian state-space model (LG-SSM). The model, a modified version of the Kalman variational auto-encoder, is end-to-end trainable, and the weights, both in the CVAE and LG-SSM, are simultaneously updated by maximizing the evidence lower bound of the marginal likelihood. In contrast to the original model, we explain the motion with a spatial transformation from one image to another. This results in sharper reconstructions and the possibility of transferring auxiliary information, such as segmentation, through the image sequence. Our experiments, on cardiac ultrasound time series, show that the dynamic model outperforms traditional image registration in execution time, to a similar performance. Further, our model offers the possibility to impute and extrapolate for missing samples.


A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions

arXiv.org Artificial Intelligence

The Transformer architecture is shown to provide a powerful framework as an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes. In particular, the attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions creating latent representations that are correctly decoded to the exact mathematical expression tree, providing robustness to ablated inputs and unseen glyphs. For the first time, the encoder is fed with spatio-temporal data tokens potentially forming an infinitely large vocabulary, which finds applications beyond that of online gesture recognition. A new supervised dataset of online handwriting gestures is provided for training models on generic handwriting recognition tasks and a new metric is proposed for the evaluation of the syntactic correctness of the output expression trees. A small Transformer model suitable for edge inference was successfully trained to an average normalised Levenshtein accuracy of 94%, resulting in valid postfix RPN tree representation for 94% of predictions.


HDNet: Hierarchical Dynamic Network for Gait Recognition using Millimeter-Wave Radar

arXiv.org Artificial Intelligence

Gait recognition is widely used in diversified practical applications. Currently, the most prevalent approach is to recognize human gait from RGB images, owing to the progress of computer vision technologies. Nevertheless, the perception capability of RGB cameras deteriorates in rough circumstances, and visual surveillance may cause privacy invasion. Due to the robustness and non-invasive feature of millimeter wave (mmWave) radar, radar-based gait recognition has attracted increasing attention in recent years. In this research, we propose a Hierarchical Dynamic Network (HDNet) for gait recognition using mmWave radar. In order to explore more dynamic information, we propose point flow as a novel point clouds descriptor. We also devise a dynamic frame sampling module to promote the efficiency of computation without deteriorating performance noticeably. To prove the superiority of our methods, we perform extensive experiments on two public mmWave radar-based gait recognition datasets, and the results demonstrate that our model is superior to existing state-of-the-art methods.


Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization

arXiv.org Artificial Intelligence

In this work, we consider the problem of weakly-supervised multi-step localization in instructional videos. An established approach to this problem is to rely on a given list of steps. However, in reality, there is often more than one way to execute a procedure successfully, by following the set of steps in slightly varying orders. Thus, for successful localization in a given video, recent works require the actual order of procedure steps in the video, to be provided by human annotators at both training and test times. Instead, here, we only rely on generic procedural text that is not tied to a specific video. We represent the various ways to complete the procedure by transforming the list of instructions into a procedure flow graph which captures the partial order of steps. Using the flow graphs reduces both training and test time annotation requirements. To this end, we introduce the new problem of flow graph to video grounding. In this setup, we seek the optimal step ordering consistent with the procedure flow graph and a given video. To solve this problem, we propose a new algorithm - Graph2Vid - that infers the actual ordering of steps in the video and simultaneously localizes them. To show the advantage of our proposed formulation, we extend the CrossTask dataset with procedure flow graph information. Our experiments show that Graph2Vid is both more efficient than the baselines and yields strong step localization results, without the need for step order annotation.


Universal speaker recognition encoders for different speech segments duration

arXiv.org Artificial Intelligence

Creating universal speaker encoders which are robust for different acoustic and speech duration conditions is a big challenge today. According to our observations systems trained on short speech segments are optimal for short phrase speaker verification and systems trained on long segments are superior for long segments verification. A system trained simultaneously on pooled short and long speech segments does not give optimal verification results and usually degrades both for short and long segments. This paper addresses the problem of creating universal speaker encoders for different speech segments duration. We describe our simple recipe for training universal speaker encoder for any type of selected neural network architecture. According to our evaluation results of wav2vec-TDNN based systems obtained for NIST SRE and VoxCeleb1 benchmarks the proposed universal encoder provides speaker verification improvements in case of different enrollment and test speech segment duration. The key feature of the proposed encoder is that it has the same inference time as the selected neural network architecture.