Goto

Collaborating Authors

 Pattern Recognition


This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition

arXiv.org Artificial Intelligence

Image recognition with prototypes is considered an interpretable alternative for black box deep learning models. Classification depends on the extent to which a test image "looks like" a prototype. However, perceptual similarity for humans can be different from the similarity learnt by the model. A user is unaware of the underlying classification strategy and does not know which image characteristics (e.g., color or shape) is the dominant characteristic for the decision. We address this ambiguity and argue that prototypes should be explained. Only visualizing prototypes can be insufficient for understanding what a prototype exactly represents, and why a prototype and an image are considered similar. We improve interpretability by automatically enhancing prototypes with extra information about visual characteristics considered important by the model. Specifically, our method quantifies the influence of color hue, shape, texture, contrast and saturation in a prototype. We apply our method to the existing Prototypical Part Network (ProtoPNet) and show that our explanations clarify the meaning of a prototype which might have been interpreted incorrectly otherwise. We also reveal that visually similar prototypes can have the same explanations, indicating redundancy. Because of the generality of our approach, it can improve the interpretability of any similarity-based method for prototypical image recognition.


Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery

arXiv.org Artificial Intelligence

Automatic surgical gesture recognition is fundamentally important to enable intelligent cognitive assistance in robotic surgery. With recent advancement in robot-assisted minimally invasive surgery, rich information including surgical videos and robotic kinematics can be recorded, which provide complementary knowledge for understanding surgical gestures. However, existing methods either solely adopt uni-modal data or directly concatenate multi-modal representations, which can not sufficiently exploit the informative correlations inherent in visual and kinematics data to boost gesture recognition accuracies. In this regard, we propose a novel approach of multimodal relational graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information through interactive message propagation in the latent feature space. In specific, we first extract embeddings from video and kinematics sequences with temporal convolutional networks and LSTM units. Next, we identify multi-relations in these multi-modal features and model them through a hierarchical relational graph learning module. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset, outperforming current uni-modal and multi-modal methods on both suturing and knot typing tasks. Furthermore, we validated our method on in-house visual-kinematics datasets collected with da Vinci Research Kit (dVRK) platforms in two centers, with consistent promising performance achieved.


Decoding Right To Explanation In AI

#artificialintelligence

Artificial Intelligence, for most people, is a tech that powers chatbots or image recognition at best โ€“ basically, a software that tells images of cats from dogs. Others view it as a serious threat to their regular day jobs. Regardless of its impact on their lives, people view AI as a technology with tremendous future potential. While the future of AI elicits awe and fear, its impact on the present remains largely unacknowledged. From shortlisting resumes to spreading propaganda, AI is working harder on us than most of us know.


Applying Machine Learning to Recognize Handwritten Characters

#artificialintelligence

Handwritten character recognition is a field of research in artificial intelligence, computer vision, and pattern recognition. A computer performing handwriting recognition is said to be able to acquire and detect characters in paper documents, pictures, touch-screen devices and other sources and convert them into machine-encoded form. Its application is found in optical character recognition, transcription of handwritten documents into digital documents and more advanced intelligent character recognition systems. Handwritten character recognition can be thought of as a subset of the image recognition problem. Basically, the algorithm takes an image (image of a handwritten digit) as an input and outputs the likelihood that the image belongs to different classes (the machine-encoded digits, 1โ€“9).


How AI Is Used in Data Center Physical Security Today

#artificialintelligence

Machine learning and artificial intelligence are touted as the cure-all for everything that ails a data center. While much of it is hype and baseless optimism, AI-powered tools are already useful and practical in some areas. Those areas include data center physical security, where AI is making a difference on three fronts: image and sound recognition, anomaly detection, and predictive analytics. Image recognition is one of the big success stories in AI, and the technology is quickly being embedded everywhere. And so is its close cousin, sound recognition.


Computer Vision: Python OCR & Object Detection Quick Starter

#artificialintelligence

Online Courses Udemy - Computer Vision: Python OCR & Object Detection Quick Starter, Quick Starter for Optical Character Recognition, Image Recognition Object Detection and Object Recognition using Python Hot & New Created by Abhilash Nelson English Students also bought Python 3.8 for beginners 2020 Docker for Beginners Python Programming from Basics to Advanced FL Studio 20 - EDM Masterclass Music Production in FL Studio Microsoft Azure Data Lake Storage Service (Gen1 & Gen2) Geospatial Data Analyses & Remote Sensing: 4 Classes in 1 Preview this course GET COUPON CODE Description Hi There! welcome to my new course'Optical Character Recognition and Object Recognition Quick Start with Python'. This is the third course from my Computer Vision series. Image Recognition, Object Detection, Object Recognition and also Optical Character Recognition are among the most used applications of Computer Vision. Using these techniques, the computer will be able to recognize and classify either the whole image, or multiple objects inside a single image predicting the class of the objects with the percentage accuracy score. Using OCR, it can also recognize and convert text in the images to machine readable format like text or a document.


KuroNet: Regularized Residual U-Nets for End-to-End Kuzushiji Character Recognition

#artificialintelligence

Kuzushiji, a cursive writing style, had been used in Japan for over a thousand years starting from the eighth century. Over 3 million books on a diverse array of topics, such as literature, science, mathematics and even cooking are preserved. However, following a change to the Japanese writing system in 1900, Kuzushiji has not been included in regular school curricula. Therefore, most Japanese natives nowadays cannot read books written or printed just 150 years ago. Museums and libraries have invested a great deal of effort into creating digital copies of these historical documents as a safeguard against fires, earthquakes and tsunamis.


Remarks on Optimal Scores for Speaker Recognition

arXiv.org Artificial Intelligence

In this article, we first establish the theory of optimal scores for speaker recognition. Our analysis shows that the minimum Bayes risk (MBR) decisions for both the speaker identification and speaker verification tasks can be based on a normalized likelihood (NL). When the underlying generative model is a linear Gaussian, the NL score is mathematically equivalent to the PLDA likelihood ratio, and the empirical scores based on cosine distance and Euclidean distance can be seen as approximations of this linear Gaussian NL score under some conditions. We discuss a number of properties of the NL score and perform a simple simulation experiment to demonstrate the properties of the NL score.


Unsupervised Discretization by Two-dimensional MDL-based Histogram

arXiv.org Machine Learning

Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectangular cells of adaptive size. Unfortunately, this approach is unable to adequately characterize dependencies among dimensions and/or results in discretizations consisting of more cells (or bins) than is desirable. To address this problem, we propose an expressive model class that allows for far more flexible partitions of two-dimensional data. We extend the state of the art for the one-dimensional case to obtain a model selection problem based on the normalised maximum likelihood, a form of refined MDL. As the flexibility of our model class comes at the cost of a vast search space, we introduce a heuristic algorithm, named PALM, which partitions each dimension alternately and then merges neighbouring regions, all using the MDL principle. Experiments on synthetic data show that PALM 1) accurately reveals ground truth partitions that are within the model class (i.e., the search space), given a large enough sample size; 2) approximates well a wide range of partitions outside the model class; 3) converges, in contrast to its closest competitor IPD; and 4) is self-adaptive with regard to both sample size and local density structure of the data despite being parameter-free. Finally, we apply our algorithm to two geographic datasets to demonstrate its real-world potential.


ICDAR 2021 Competition: Detecting Tables Using Image Recognition

#artificialintelligence

Table recognition is a well-studied problem in document analysis, and many academic and commercial approaches have been developed to recognize tables in several document formats, including plain text, scanned page images, and born-digital, object-based formats such as PDF. There are several works that can convert tables in text-based PDF format into structured representations. However, there is limited work on image-based table content recognition. The proposed challenge aims at assessing the ability of state-of-the-art methods to recognize scientific tables in LaTeX format. Our shared task has two subtasks.