Pattern Recognition
Double Supervised Network with Attention Mechanism for Scene Text Recognition
Gao, Yuting, Huang, Zheng, Dai, Yuchen
In this paper, we propose Double Supervised Network with Attention Mechanism (DSAN), a novel end-to-end trainable framework for scene text recognition. It incorporates one text attention module during feature extraction which enforces the model to focus on text regions and the whole framework is supervised by two branches. One supervision branch comes from context-level modelling and another comes from one extra supervision enhancement branch which aims at tackling inexplicit semantic information at character level. These two supervisions can benefit each other and yield better performance. The proposed approach can recognize text in arbitrary length and does not need any predefined lexicon. Our method outperforms the current state-of-the-art methods on three text recognition benchmarks: IIIT5K, ICDAR2013 and SVT reaching accuracy 88.6%, 92.3% and 84.1% respectively which suggests the effectiveness of the proposed method.
Spectral Mixture Kernels with Time and Phase Delay Dependencies
Chen, Kai, Groot, Perry, Chen, Jinsong, Marchiori, Elena
Spectral Mixture (SM) kernels form a powerful class of kernels for Gaussian processes, capable to discover patterns, extrapolate, and model negative co-variances. In SM kernels, spectral mixture components are linearly combined to construct a final flexible kernel. As a consequence SM kernels does not explicitly model correlations between components and dependencies related to time and phase delays between components, because only the auto-convolution of base components are used. To address these drawbacks we introduce Generalized Convolution Spectral Mixture (GCSM) kernels. We incorporate time and phase delay into the base spectral mixture and use cross-convolution between a base component and the complex conjugate of another base component to construct a complex-valued and positive definite kernel representing correlations between base components. In this way the total number of components in GCSM becomes quadratic. We perform a thorough comparative experimental analysis of GCSM on synthetic and real-life datasets. Results indicate the beneficial effect of the extra features of GCSM. This is illustrated in the problem of forecasting the long range trend of a river flow to monitor environment evolution, where GCSM is capable of discovering correlated patterns that SM cannot and improving patterns recognition ability of SM.
eBay adds drag-and-drop ability to its image search tool
Last year, eBay launched a visual search capability for its mobile app that makes it possible to find items with pictures instead of words. Now, the auction platform is making it even easier to use -- you don't even have to take screenshots of whatever it is you want to find anymore, because eBay will soon allow you to drag and drop images into its search bar. For instance, if you search for a "Hello Kitty purse" in the app and find one that catches your eye, you can drag that photo into the search bar to find listings featuring identical or similar items. It won't only give you a way to search for purchases quickly, but also to find the best deals on the website. According to eBay, its convolutional neural networks process the photo you use by transforming it into a vector representation.
Memory, Search and Sense: A Theory about Nesting and Abstraction
Abstract--This paper describes an automatic process for combining patterns and features, to guide a search process and reason about it. It is based on the functionality that a human brain might have, which is a highly distributed network of simple neuronal components that can apply some level of matching and cross-referencing over retrieved patterns. The process uses memory in a more dynamic way and it can realise results using a shallow hierarchy, which is a recognised brain-like construct. The paper gives one example of the process, using computer chess as a case study. The second half of the paper then presents a formal language for describing the global pattern sequences and transitions. These pattern ensembles are created from the same techniques that the search and prediction processes require and they define an outer framework that a distributed setup can try to learn. They can also be created automatically, resulting in further functionality for the generic cognitive model.
Unified Hypersphere Embedding for Speaker Recognition
Hajibabaei, Mahdi, Dai, Dengxin
ABSTRACT Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition system without use of extra data or deeper and more complex models by augmenting the training and testing data, finding the optimal dimensionality of embedding space and use of more discriminative loss functions. Index Terms-- speaker recognition, speaker verification, augmentation, discriminative loss function, convolutional neural networks 1. INTRODUCTION Speaker recognition is an area of research with more than 50 years of history and applications ranging from forensics and security to human-computer interaction in consumer electronics. Speaker recognition can be categorized into two tasks of text-dependent and text-independent speaker recognition with regard to the similarity of the uttered content between utterances.
A beginner's guide to AI: Computer vision and image recognition
This is the second story in our continuing series covering the basics of artificial intelligence. While it isn't necessary to read the first article, which covers neural networks, doing so may add to your understanding of the topics covered in this one. Teaching a computer how to'see' is no small feat. You can slap a camera on a PC, but that won't give it sight. In order for a machine to actually view the world like people or animals do, it relies on computer vision and image recognition.
RealNetworks Launches Free Facial Recognition Tool for Schools
Like many parents in the United States, Rob Glaser has been thinking a lot lately about how to keep his kids from getting shot in school. Specifically, he's been thinking of what he can do that doesn't involve getting into a nasty and endless battle over what he calls "the g-word." It's not that Glaser opposes gun control. A steady Democratic donor, Glaser founded the online streaming giant RealNetworks back in the 1990s as a vehicle for broadcasting left-leaning political views. It's just that any conversation about curbing gun rights in America tends to lead more to gridlock and finger-pointing than it does to action.
AI does not understand "sexy"
Machine learning algorithms can uncover complex patterns in the data they see, making them useful for image recognition, predicting customer service questions, or recommending movies. They can even do a decent job at naming craft beers, kittens, or guinea pigs. But one thing it turns out they're bad at? Understanding what humans find sexy. I had my first sign that this was a problem when I trained a neural network to generate new Halloween costumes and saw its attempts at the "sexy" category of names - it came up with ideas like Sexy Gargles, Pretty zombie Space Suit, and Sexy the Spock.