Goto

Collaborating Authors

 Pattern Recognition


Bilevel Distance Metric Learning for Robust Image Recognition

Neural Information Processing Systems

Metric learning, aiming to learn a discriminative Mahalanobis distance matrix M that can effectively reflect the similarity between data samples, has been widely studied in various image recognition problems. Most of the existing metric learning methods input the features extracted directly from the original data in the preprocess phase. What's worse, these features usually take no consideration of the local geometrical structure of the data and the noise existed in the data, thus they may not be optimal for the subsequent metric learning task. In this paper, we integrate both feature extraction and metric learning into one joint optimization framework and propose a new bilevel distance metric learning model. Specifically, the lower level characterizes the intrinsic data structure using graph regularized sparse coefficients, while the upper level forces the data samples from the same class to be close to each other and pushes those from different classes far away.


Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data

Neural Information Processing Systems

We demonstrate that a generative model for object shapes can achieve state of the art results on challenging scene text recognition tasks, and with orders of magnitude fewer training images than required for competing discriminative methods. In addition to transcribing text from challenging images, our method performs fine-grained instance segmentation of characters. We show that our model is more robust to both affine transformations and non-affine deformations compared to previous approaches. Papers published at the Neural Information Processing Systems Conference.


Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances

arXiv.org Machine Learning

Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions according to the results obtained for early NIST SRE (Speaker Recognition Evaluation) datasets. From the practical point of view, taking into account the increased interest in virtual assistants (such as Amazon Alexa, Google Home, AppleSiri, etc.), speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances. For these purposes, we considered deep neural network architectures based on TDNN (TimeDelay Neural Network) and ResNet (Residual Neural Network) blocks. We experimented with state-of-the-art embedding extractors and their training procedures. Obtained results confirm that ResNet architectures outperform the standard x-vector approach in terms of speaker verification quality for both long-duration and short-duration utterances. We also investigate the impact of speech activity detector, different scoring models, adaptation and score normalization techniques. The experimental results are presented for publicly available data and verification protocols for the VoxCeleb1, VoxCeleb2, and VOiCES datasets.


x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

arXiv.org Machine Learning

In this work, we explore the dependencies between speaker recognition and emotion recognition. We first show that knowledge learned for speaker recognition can be reused for emotion recognition through transfer learning. Then, we show the effect of emotion on speaker recognition. For emotion recognition, we show that using a simple linear model is enough to obtain good performance on the features extracted from pre-trained models such as the x-vector model. Then, we improve emotion recognition performance by fine-tuning for emotion classification. We evaluated our experiments on three different types of datasets: IEMOCAP, MSP-Podcast, and Crema-D. By fine-tuning, we obtained 30.40%, 7.99%, and 8.61% absolute improvement on IEMOCAP, MSP-Podcast, and Crema-D respectively over baseline model with no pre-training. Finally, we present results on the effect of emotion on speaker verification. We observed that speaker verification performance is prone to changes in test speaker emotions. We found that trials with angry utterances performed worst in all three datasets. We hope our analysis will initiate a new line of research in the speaker recognition community.


AI Online Filters to Real World Image Recognition

arXiv.org Artificial Intelligence

Deep artificial neural networks, trained with labeled data sets are widely used in numerous vision and robotics applications today. In terms of AI, these are called reflex models, referring to the fact that they do not self-evolve or actively adapt to environmental changes. As demand for intelligent robot control expands to many high level tasks, reinforcement learning and state based models play an increasingly important role. Herein, in computer vision and robotics domain, we study a novel approach to add reinforcement controls onto the image recognition reflex models to attain better overall performance, specifically to a wider environment range beyond what is expected of the task reflex models. Follow a common infrastructure with environment sensing and AI based modeling of self-adaptive agents, we implement multiple types of AI control agents. To the end, we provide comparative results of these agents with baseline, and an insightful analysis of their benefit to improve overall image recognition performance in real world.


Why Did Humans Evolve Pattern Recognition Abilities? Cognition Today

#artificialintelligence

These mechanisms emerge as a response to patterns in the environment or enable us to refine our ability to spot them. Pattern recognition skills sit at the helm of our basic cognitive architecture. A common problem during hunting is to estimate how many predators there are โ€“ based on cues like animal sounds, footprints, etc. Say a pack of 4 hunters is trying to isolate a prey for food. The hunters can only survive if they have the physical capability to defend themselves and successfully kill or escape. If they do not have the ability, they will die.


Automatic Speech Transcription And Speaker Recognition Simultaneously Using Apple AI

#artificialintelligence

Last year, Apple witnessed several controversies regarding its speech recognition technology. To provide quality control in the company's voice assistant Siri, Apple asked its contractors to regularly hear the confidential voice recordings in the name of the "Siri Grading Program". However, to this matter, the company later apologised and published a statement where it announced the changes in the Siri grading program. This year, the tech giant has been gearing up a number of researchers regarding speech recognition technology to upgrade its voice assistant. Recently, the researchers at Apple developed an AI model which can perform automatic speech transcription and speaker recognition simultaneously.


Improving S&P stock prediction with time series stock similarity

arXiv.org Machine Learning

Stock market prediction with forecasting algorithms is a popular topic these days where most of the forecasting algorithms train only on data collected on a particular stock. In this paper, we enriched the stock data with related stocks just as a professional trader would have done to improve the stock prediction models. We tested five different similarities functions and found co-integration similarity to have the best improvement on the prediction model. We evaluate the models on seven S&P stocks from various industries over five years period. The prediction model we trained on similar stocks had significantly better results with 0.55 mean accuracy, and 19.782 profit compare to the state of the art model with an accuracy of 0.52 and profit of 6.6.


The Science Of Patterns

#artificialintelligence

Humans are natural pattern recognizers. Whether, as in prehistoric times, we were recognizing danger in a telltale rustle of the bushes or skimming a page of letters and numbers today, we use patterns to derive meaning without having to do a more detailed inspection. Futurist and entrepreneur Ray Kurzweil considers pattern recognition so important that in his recent book, How to Create A Mind, he argued that pattern recognition and intelligence are essentially the same thing. Expertise, in essence, is the familiarity of patterns of a specific field. Today, machines are learning to recognize patterns as well.


Natural Language Processing (NLP) Market to Reach USD 80.68 billion by 2026; Increasing Demand for Enhanced Algorithms to Boost Growth, says Fortune Business Insights

#artificialintelligence

Key Companies Covered in NLP Market Research Report are 3M Company, Adobe Systems Inc., Amazon Web Services Inc., Apple Inc., Google (Alphabet Inc.), Hewlett-Packard Enterprise Company, Intel Corporation, Microsoft Corporation, SAS Institute Inc., Other key market players The global Natural Language Processing (NLP) Market size is projected to reach USD 80.68 billion by 2026, thereby exhibiting a CAGR of 32.4% during the forecast period. This information is published by Fortune Business Insights, in a report, titled, "Natural Language Processing (NLP) Market Size, Share & Industry Analysis, By Deployment (On-Premises, Cloud, and Hybrid), By Technology (Interactive Voice Response (IVR), Optical Character Recognition (OCR), Text Analytics, Speech Analytics, Classification and Categorization, Pattern and Image Recognition, and Others), By Industry Vertical (Healthcare, Retail, High Tech and Telecom, BFSI, Automotive & Transportation, Advertising & Media, Manufacturing, and Others) and Regional Forecast, 2019-2026." The report further states that the market was USD 8.61 billion in 2018. It is set to gain momentum from the rising demand for big data, improved algorithms, and powerful computing. What Does the Report Contain?