Goto

Collaborating Authors

 Pattern Recognition


Interpretable Distance Metric Learning for Handwritten Chinese Character Recognition

arXiv.org Artificial Intelligence

Handwriting recognition is of crucial importance to both Human Computer Interaction (HCI) and paperwork digitization. In the general field of Optical Character Recognition (OCR), handwritten Chinese character recognition faces tremendous challenges due to the enormously large character sets and the amazing diversity of writing styles. Learning an appropriate distance metric to measure the difference between data inputs is the foundation of accurate handwritten character recognition. Existing distance metric learning approaches either produce unacceptable error rates, or provide little interpretability in the results. In this paper, we propose an interpretable distance metric learning approach for handwritten Chinese character recognition. The learned metric is a linear combination of intelligible base metrics, and thus provides meaningful insights to ordinary users. Our experimental results on a benchmark dataset demonstrate the superior efficiency, accuracy and interpretability of our proposed approach.


Digital Peter: Dataset, Competition and Handwriting Recognition Methods

arXiv.org Artificial Intelligence

This paper presents a new dataset of Peter the Great's manuscripts and describes a segmentation procedure that converts initial images of documents into the lines. The new dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9 694 images and text files corresponding to lines in historical documents. The open machine learning competition Digital Peter was held based on the considered dataset. The baseline solution for this competition as well as more advanced methods on handwritten text recognition are described in the article. Full dataset and all code are publicly available.


Image Recognition A.I. Has a Weakness. This Could Fix It

#artificialintelligence

You're probably familiar with deepfakes, the digitally altered "synthetic media" that's capable of fooling people into seeing or hearing things that never actually happened. Adversarial examples are like deepfakes for image-recognition A.I. systems -- and while they don't look even slightly strange to us, they're capable of befuddling the heck out of machines. Several years ago, researchers at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL) found that they could fool even sophisticated image recognition algorithms into confusing objects simply by slightly altering their surface texture. In the researchers' demonstration, they showed that it was possible to get a cutting-edge neural network to look at a 3D-printed turtle and see a rifle instead. Or to gaze upon a baseball and come away with the conclusion that it is an espresso.


A Visual History of Interpretation for Image Recognition

#artificialintelligence

These first two papers are similar in that they both probe the internals of a neural network by using gradient ascent. In other words, they consider what small changes to the input or to the activations will increase the probability of a predicted class. The first paper applies this to the activations, and the authors report that "it is [possible] to find good qualitative interpretations of high level features. We show that, perhaps counter-intuitively, such interpretation is possible at the unit level, that it is simple to accomplish and that the results are consistent across various techniques."


Hundreds of sewage leaks detected thanks to AI

#artificialintelligence

As the name suggests, pattern recognition is a way of using computing to detect regular or repeating elements in data. Machine learning is an approach to detecting those patterns using algorithms that improve automatically, through experience and through the analysis of data.


Neural network CLIP mirrors human brain neurons in image recognition

#artificialintelligence

Open AI, the research company founded by Elon Musk, has just discovered that their artificial neural network CLIP shows behavior strikingly similar to a human brain. This find has scientists hopeful for the future of AI networks' ability to identify images in a symbolic, conceptual and literal capacity. While the human brain processes visual imagery by correlating a series of abstract concepts to an overarching theme, the first biological neuron recorded to operate in a similar fashion was the "Halle Berry" neuron. This neuron proved capable of recognizing photographs and sketches of the actress and connecting those images with the name "Halle Berry." Now, OpenAI's multimodal vision system continues to outperform existing systems, namely with traits such as the "Spider-Man" neuron, an artificial neuron which can identify not only the image of the text "spider" but also the comic book character in both illustrated and live action form.


A Visual History of Interpretation for Image Recognition

#artificialintelligence

Deep learning (DL) algorithms have, over the past decade, emerged as the most competitive image recognition algorithms; however, they are by default "black box" algorithms: it is difficult to explain why they make a specific prediction. Why is that an issue? Users of ML models often want the ability to interpret which parts of the image led to the algorithm's prediction for many reasons: Motivated by these use cases, during the last decade, researchers developed many different methods to open the "black box" of deep learning, aiming to make underlying models more explainable. Some methods are specific for certain kinds of algorithms, while some are general. Some are fast, and some are slow.


Enhancing Medical Image Registration via Appearance Adjustment Networks

arXiv.org Artificial Intelligence

Deformable image registration is fundamental for many medical image analyses. A key obstacle for accurate image registration is the variations in image appearance. Recently, deep learning-based registration methods (DLRs), using deep neural networks, have computational efficiency that is several orders of magnitude greater than traditional optimization-based registration methods (ORs). A major drawback, however, of DLRs is a disregard for the target-pair-specific optimization that is inherent in ORs and instead they rely on a globally optimized network that is trained with a set of training samples to achieve faster registration. Thus, DLRs inherently have degraded ability to adapt to appearance variations and perform poorly, compared to ORs, when image pairs (fixed/moving images) have large differences in appearance. Hence, we propose an Appearance Adjustment Network (AAN) where we leverage anatomy edges, through an anatomy-constrained loss function, to generate an anatomy-preserving appearance transformation. We designed the AAN so that it can be readily inserted into a wide range of DLRs, to reduce the appearance differences between the fixed and moving images. Our AAN and DLR's network can be trained cooperatively in an unsupervised and end-to-end manner. We evaluated our AAN with two widely used DLRs - Voxelmorph (VM) and FAst IMage registration (FAIM) - on three public 3D brain magnetic resonance (MR) image datasets - IBSR18, Mindboggle101, and LPBA40. The results show that DLRs, using the AAN, improved performance and achieved higher results than state-of-the-art ORs.


Benchmarking Off-The-Shelf Solutions to Robotic Assembly Tasks

arXiv.org Artificial Intelligence

In recent years, many learning based approaches have been studied to realize robotic manipulation and assembly tasks, often including vision and force/tactile feedback. However, it remains frequently unclear what is the baseline state-of-the-art performance and what are the bottleneck problems. In this work, we evaluate some off-the-shelf (OTS) industrial solutions on a recently introduced benchmark, the National Institute of Standards and Technology (NIST) Assembly Task Boards. A set of assembly tasks are introduced and baseline methods are provided to understand their intrinsic difficulty. Multiple sensor-based robotic solutions are then evaluated, including hybrid force/motion control and 2D/3D pattern matching algorithms. An end-to-end integrated solution that accomplishes the tasks is also provided. The results and findings throughout the study reveal a few noticeable factors that impede the adoptions of the OTS solutions: expertise dependent, limited applicability, lack of interoperability, no scene awareness or error recovery mechanisms, and high cost. This paper also provides a first attempt of an objective benchmark performance on the NIST Assembly Task Boards as a reference comparison for future works on this problem.


Touchless Palmprint Recognition based on 3D Gabor Template and Block Feature Refinement

arXiv.org Artificial Intelligence

With the growing demand for hand hygiene and convenience of use, palmprint recognition with touchless manner made a great development recently, providing an effective solution for person identification. Despite many efforts that have been devoted to this area, it is still uncertain about the discriminative ability of the contactless palmprint, especially for large-scale datasets. To tackle the problem, in this paper, we build a large-scale touchless palmprint dataset containing 2334 palms from 1167 individuals. To our best knowledge, it is the largest contactless palmprint image benchmark ever collected with regard to the number of individuals and palms. Besides, we propose a novel deep learning framework for touchless palmprint recognition named 3DCPN (3D Convolution Palmprint recognition Network) which leverages 3D convolution to dynamically integrate multiple Gabor features. In 3DCPN, a novel variant of Gabor filter is embedded into the first layer for enhancement of curve feature extraction. With a well-designed ensemble scheme,low-level 3D features are then convolved to extract high-level features. Finally on the top, we set a region-based loss function to strengthen the discriminative ability of both global and local descriptors. To demonstrate the superiority of our method, extensive experiments are conducted on our dataset and other popular databases TongJi and IITD, where the results show the proposed 3DCPN achieves state-of-the-art or comparable performances.