Pattern Recognition
Evaluating Classifier Confidence for Surface EMG Pattern Recognition
Surface electromyogram (EMG) can be employed as an interface signal for various devices and software via pattern recognition. In EMG-based pattern recognition, the classifier should not only be accurate, but also output an appropriate confidence (i.e., probability of correctness) for its prediction. If the confidence accurately reflects the likelihood of true correctness, then it will be useful in various application tasks, such as motion rejection and online adaptation. The aim of this paper is to identify the types of classifiers that provide higher accuracy and better confidence in EMG pattern recognition. We evaluate the performance of various discriminative and generative classifiers on four EMG datasets, both visually and quantitatively. The analysis results show that while a discriminative classifier based on a deep neural network exhibits high accuracy, it outputs a confidence that differs from true probabilities. By contrast, a scale mixture model-based classifier, which is a generative classifier that can account for uncertainty in EMG variance, exhibits superior performance in terms of both accuracy and confidence.
Force Map: Learning to Predict Contact Force Distribution from Vision
Hanai, Ryo, Domae, Yukiyasu, Ramirez-Alpizar, Ixchel G., Leme, Bruno, Ogata, Tetsuya
When humans see a scene, they can roughly imagine the forces applied to objects based on their experience and use them to handle the objects properly. This paper considers transferring this "force-visualization" ability to robots. We hypothesize that a rough force distribution (named "force map") can be utilized for object manipulation strategies even if accurate force estimation is impossible. Based on this hypothesis, we propose a training method to predict the force map from vision. To investigate this hypothesis, we generated scenes where objects were stacked in bulk through simulation and trained a model to predict the contact force from a single image. We further applied domain randomization to make the trained model function on real images. The experimental results showed that the model trained using only synthetic images could predict approximate patterns representing the contact areas of the objects even for real images. Then, we designed a simple algorithm to plan a lifting direction using the predicted force distribution. We confirmed that using the predicted force distribution contributes to finding natural lifting directions for typical real-world scenes. Furthermore, the evaluation through simulations showed that the disturbance caused to surrounding objects was reduced by 26 % (translation displacement) and by 39 % (angular displacement) for scenes where objects were overlapping.
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Iscen, Ahmet, Fathi, Alireza, Schmid, Cordelia
Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems. The goal is to enhance the recognition capabilities of the model by retrieving similar examples for the visual input from an external memory set. In this work, we introduce an attention-based memory module, which learns the importance of each retrieved example from the memory. Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query. We also thoroughly study various ways of constructing the memory dataset. Our experiments show the benefit of using a massive-scale memory dataset of 1B image-text pairs, and demonstrate the performance of different memory representations. We evaluate our method in three different classification tasks, namely long-tailed recognition, learning with noisy labels, and fine-grained classification, and show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
A Hybrid Deep Feature-Based Deformable Image Registration Method for Pathology Images
Zhang, Chulong, Jiang, Yuming, Li, Na, Zhang, Zhicheng, Islam, Md Tauhidul, Dai, Jingjing, Liu, Lin, He, Wenfeng, Qin, Wenjian, Xiong, Jing, Xie, Yaoqin, Liang, Xiaokun
Pathologists need to combine information from differently stained pathology slices for accurate diagnosis. Deformable image registration is a necessary technique for fusing multi-modal pathology slices. This paper proposes a hybrid deep feature-based deformable image registration framework for stained pathology samples. We first extract dense feature points via the detector-based and detector-free deep learning feature networks and perform points matching. Then, to further reduce false matches, an outlier detection method combining the isolation forest statistical model and the local affine correction model is proposed. Finally, the interpolation method generates the deformable vector field for pathology image registration based on the above matching points. We evaluate our method on the dataset of the Non-rigid Histology Image Registration (ANHIR) challenge, which is co-organized with the IEEE ISBI 2019 conference. Our technique outperforms the traditional approaches by 17% with the Average-Average registration target error (rTRE) reaching 0.0034. The proposed method achieved state-of-the-art performance and ranked 1st in evaluating the test dataset. The proposed hybrid deep feature-based registration method can potentially become a reliable method for pathology image registration.
Learning Distance Metrics with Triplet Loss: Advantages and Challenges - AITechTrend
Triplet loss is a loss function that is widely used in machine learning for tasks such as image recognition, facial recognition, and information retrieval. The idea behind triplet loss is to learn a distance metric between objects such that objects that are similar are close together in the metric space, while objects that are dissimilar are far apart. In this article, we will introduce triplet loss, discuss how it works, and explore some of its applications. Triplet loss is a type of loss function used in machine learning that is designed to learn a distance metric between objects. The goal of triplet loss is to embed objects in a metric space such that objects that are similar are close together in the space, while objects that are dissimilar are far apart.
What Affects Learned Equivariance in Deep Image Recognition Models?
Bruintjes, Robert-Jan, Motyka, Tomasz, van Gemert, Jan
Equivariance w.r.t. geometric transformations in neural networks improves data efficiency, parameter efficiency and robustness to out-of-domain perspective shifts. When equivariance is not designed into a neural network, the network can still learn equivariant functions from the data. We quantify this learned equivariance, by proposing an improved measure for equivariance. We find evidence for a correlation between learned translation equivariance and validation accuracy on ImageNet. We therefore investigate what can increase the learned equivariance in neural networks, and find that data augmentation, reduced model capacity and inductive bias in the form of convolutions induce higher learned equivariance in neural networks.
Neural Network Star Pattern Recognition for Spacecraft Attitude Determination and Control
Currently, the most complex spacecraft attitude determination and control tasks are ultimately governed by ground-based systems and personnel. Conventional on-board systems face severe serial microprocessors operating on inherently parallel problems. New computer architectures based on the anatomy of the human brain seem to promise high speed and fault-tolerant solutions to the limitations of serial processing.
Recognizing Hand-Printed Letters and Digits
We are developing a hand-printed character recognition system using a multi(cid:173) layered neural net trained through backpropagation. We report on results of training nets with samples of hand-printed digits scanned off of bank checks and hand-printed letters interactively entered into a computer through a sty(cid:173) lus digitizer. Given a large training set, and a net with sufficient capacity to achieve high performance on the training set, nets typically achieved error rates of 4-5% at a 0% reject rate and 1-2% at a 10% reject rate. The topology and capacity of the system, as measured by the number of connections in the net, have surprisingly little effect on generalization. For those developing practical pattern recognition systems, these results suggest that a large and representative training sample may be the single, most important factor in achieving high recognition accuracy.
On Stochastic Complexity and Admissible Models for Neural Network Classifiers
In this paper we examine in a general sense the application of Minimum Description Length (MDL) techniques to the problem of selecting a good classifier from a large set of candidate models or hypotheses. Pattern recognition algorithms differ from more conventional statistical modeling techniques in the sense that they typically choose from a very large number of candidate models to describe the available data. Hence, the problem of searching through this set of candidate models is frequently a formidable one, often approached in practice by the use of greedy algorithms. In this context, techniques which allow us to eliminate portions of the hypothesis space are of considerable interest. We will show in this paper that it is possible to use the intrinsic structure of the MDL formalism to eliminate large numbers of candidate models given only minimal information about the data.
Adaptive Elastic Models for Hand-Printed Character Recognition
Hand-printed digits can be modeled as splines that are governed by about 8 control points. Images of digits can be produced by placing Gaussian ink generators uniformly along the spline. Real images can be recognized by finding the digit model most likely to have generated the data. For each digit model we use an elastic matching algorithm to minimize an energy function that includes both the defor(cid:173) mation energy of the digit model and the log probability that the model would generate the inked pixels in the image. If a uniform noise process is included in the model of image generation, some of the inked pixels can be rejected as noise as a digit model is fitting a poorly segmented image.