Pattern Recognition
Long-term stable Electromyography classification using Canonical Correlation Analysis
Donati, Elisa, Benatti, Simone, Ceolini, Enea, Indiveri, Giacomo
Discrimination of hand gestures based on the decoding of surface electromyography (sEMG) signals is a well-establish approach for controlling prosthetic devices and for Human-Machine Interfaces (HMI). However, despite the promising results achieved by this approach in well-controlled experimental conditions, its deployment in long-term real-world application scenarios is still hindered by several challenges. One of the most critical challenges is maintaining high EMG data classification performance across multiple days without retraining the decoding system. The drop in performance is mostly due to the high EMG variability caused by electrodes shift, muscle artifacts, fatigue, user adaptation, or skin-electrode interfacing issues. Here we propose a novel statistical method based on canonical correlation analysis (CCA) that stabilizes EMG classification performance across multiple days for long-term control of prosthetic devices. We show how CCA can dramatically decrease the performance drop of standard classifiers observed across days, by maximizing the correlation among multiple-day acquisition data sets. Our results show how the performance of a classifier trained on EMG data acquired only of the first day of the experiment maintains 90% relative accuracy across multiple days, compensating for the EMG data variability that occurs over long-term periods, using the CCA transformation on data obtained from a small number of gestures. This approach eliminates the need for large data sets and multiple or periodic training sessions, which currently hamper the usability of conventional pattern recognition based approaches
Boosting Synthetic Data Generation with Effective Nonlinear Causal Discovery
Cinquini, Martina, Giannotti, Fosca, Guidotti, Riccardo
Synthetic data generation has been widely adopted in software testing, data privacy, imbalanced learning, and artificial intelligence explanation. In all such contexts, it is crucial to generate plausible data samples. A common assumption of approaches widely used for data generation is the independence of the features. However, typically, the variables of a dataset depend on one another, and these dependencies are not considered in data generation leading to the creation of implausible records. The main problem is that dependencies among variables are typically unknown. In this paper, we design a synthetic dataset generator for tabular data that can discover nonlinear causalities among the variables and use them at generation time. State-of-the-art methods for nonlinear causal discovery are typically inefficient. We boost them by restricting the causal discovery among the features appearing in the frequent patterns efficiently retrieved by a pattern mining algorithm. We design a framework for generating synthetic datasets with known causalities to validate our proposal. Broad experimentation on many synthetic and real datasets with known causalities shows the effectiveness of the proposed method.
Computer Vision applications for the industry
This article gives an overview of the growth factors and drivers of computer vision, the market segments and leaders, and finally concludes with use cases that include the latest advancements in the construction industry, manufacturing industry, and healthcare. Computer Vision (CV) is a sub-field of artificial intelligence (AI) which consists in bringing human vision capability into computing systems. It deals with interpreting real-world scenarios which are captured by camera-enabled mobile devices in the form of images and videos. Some of the most commonly used computer vision-based applications are facial recognition, human-computer interface, gesture recognition, visual quality inspection of goods in manufacturing processes, in navigation for autonomous vehicles, medical image analysis, and image restoration. Despite big hype and the success of CV/AI, there remain, however, some challenges for the technology adoption in the industry.
6 Days, 5 Key Takeaways: Computer Vision and Pattern Recognition Conference 2022
This June, I attended CVPR, an annual event which gathers the best researchers and practitioners of computer vision from around the world. It was my second year there, and I wanted to summarize its many highlights for my colleagues at Lightricks, and our wider community. This article is my perspective on the big things happening in computer vision research, according to my experience at this year's CVPR. I'll try to collate current trends and emphasize the big, promising advances in the field, while staying somewhat "zoomed out", in order to give you the bigger picture. I've also linked to lots of detailed, more closely focused articles, so if you're interested in a specific subject, there should still be plenty for you to dive into. The article is divided into five key takeaways, based on an one hour lecture I presented to the Lightricks research group.
The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description
Tevissen, Yannis, Boudy, Jérôme, Petitpont, Frédéric
We describe the system used by our team for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC 2022) in the speaker diarization track. Our solution was designed around a new combination of voice activity detection algorithms that uses the strengths of several systems. We introduce a novel multi stream approach with a decision protocol based on classifiers entropy. We called this method a multi-stream voice activity detection and used it with standard baseline diarization embeddings, clustering and resegmentation. With this work, we successfully demonstrated that using a strong baseline and working only on voice activity detection, one can achieved close to state-of-theart results.
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)
Chernishev, George, Polyntsov, Michael, Chizhov, Anton, Stupakov, Kirill, Shchuckin, Ilya, Smirnov, Alexander, Strutovsky, Maxim, Shlyonskikh, Alexey, Firsov, Mikhail, Manannikov, Stepan, Bobrov, Nikita, Goncharov, Daniil, Barutkin, Ilia, Shalnev, Vladislav, Muraviev, Kirill, Rakhmukova, Anna, Shcheka, Dmitriy, Chernikov, Anton, Mandelshtam, Dmitrii, Vyrodov, Mikhail, Saliou, Arthur, Gaisin, Eduard, Smirnov, Kirill
Pioneering data profiling systems such as Metanome and OpenClean brought public attention to science-intensive data profiling. This type of profiling aims to extract complex patterns (primitives) such as functional dependencies, data constraints, association rules, and others. However, these tools are research prototypes rather than production-ready systems. The following work presents Desbordante - a high-performance science-intensive data profiler with open source code. Unlike similar systems, it is built with emphasis on industrial application in a multi-user environment. It is efficient, resilient to crashes, and scalable. Its efficiency is ensured by implementing discovery algorithms in C++, resilience is achieved by extensive use of containerization, and scalability is based on replication of containers. Desbordante aims to open industrial-grade primitive discovery to a broader public, focusing on domain experts who are not IT professionals. Aside from the discovery of various primitives, Desbordante offers primitive validation, which not only reports whether a given instance of primitive holds or not, but also points out what prevents it from holding via the use of special screens. Next, Desbordante supports pipelines - ready-to-use functionality implemented using the discovered primitives, for example, typo detection. We provide built-in pipelines, and the users can construct their own via provided Python bindings. Unlike other profilers, Desbordante works not only with tabular data, but with graph and transactional data as well. In this paper, we present Desbordante, the vision behind it and its use-cases. To provide a more in-depth perspective, we discuss its current state, architecture, and design decisions it is built on. Additionally, we outline our future plans.
Pattern Recognition Definition
The patterns are made up of individual features, which can be continuous, discrete or even discrete binary variables, or sets of features evaluated together, known as a feature vector. The biggest advantages are that this model will generate a classification of some confidence level for every data point and often reveals subtle, hidden patterns not readily seen with human intuition. Generally, the more feature variables the algorithm is programmed to check for and the more data points available for training, the more accurate it will be. This applies whether the database is labeled or unlabeled.
IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling
Wen, Yilin, Luo, Biao, Zhao, Yuqian
Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data. However, for complex multimodal information and sparse training data, it is usually difficult to achieve interpretability and high accuracy simultaneously for most methods. To address this difficulty, a new model is developed in this paper, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM). First, a multi-modal fine-grained fusion method is proposed, and Vgg16 and Optical Character Recognition (OCR) techniques are adopted to effectively extract text information from images and images. Then, the knowledge graph link prediction task is modelled as an offline reinforcement learning Markov decision model, which is then abstracted into a unified sequence framework. An interactive perception-based reward expectation mechanism and a special causal masking mechanism are designed, which "converts" the query into an inference path. Then, an autoregressive dynamic gradient adjustment mechanism is proposed to alleviate the insufficient problem of multimodal optimization. Finally, two datasets are adopted for experiments, and the popular SOTA baselines are used for comparison. The results show that the developed IMKGA-SM achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
Introducing Model Inversion Attacks on Automatic Speaker Recognition
Pizzi, Karla, Boenisch, Franziska, Sahin, Ugur, Böttinger, Konstantin
Model inversion (MI) attacks allow to reconstruct average per-class representations of a machine learning (ML) model's training data. It has been shown that in scenarios where each class corresponds to a different individual, such as face classifiers, this represents a severe privacy risk. In this work, we explore a new application for MI: the extraction of speakers' voices from a speaker recognition system. We present an approach to (1) reconstruct audio samples from a trained ML model and (2) extract intermediate voice feature representations which provide valuable insights into the speakers' biometrics. Therefore, we propose an extension of MI attacks which we call sliding model inversion. Our sliding MI extends standard MI by iteratively inverting overlapping chunks of the audio samples and thereby leveraging the sequential properties of audio data for enhanced inversion performance. We show that one can use the inverted audio data to generate spoofed audio samples to impersonate a speaker, and execute voice-protected commands for highly secured systems on their behalf. To the best of our knowledge, our work is the first one extending MI attacks to audio data, and our results highlight the security risks resulting from the extraction of the biometric data in that setup.
Real-time gesture recognition through use of wearable device and A-mode ultrasound
A-mode ultrasound has the advantages of high resolution, simple calculation and low cost in predicting skillful gestures. In order to accelerate the popularization of A-mode ultrasonic gesture recognition technology, we have developed a human-machine interface that can interact with the user in real time. Data processing includes Gaussian filtering, feature extraction, and PCA dimension reduction. NB, LDA and SVM algorithms were chosen to train machine learning models. The entire process was written in C to classify gestures in real time.