Goto

Collaborating Authors

 Image Matching


Cognitive Explainable Artificial Intelligence (AI) breakthroughs in Machine Learning (ML) for US Air Force: 3D Image Recognition using few training samples on CPU (without GPU)

#artificialintelligence

Z Advanced Computing, Inc. (ZAC), the pioneer Cognitive Explainable-AI (Artificial Intelligence) (Cognitive XAI) software startup, has made AI and Machine Learning (ML) breakthroughs: ZAC has achieved 3D Image Recognition using only a few training samples, and using only an average laptop with low power CPU, for both training and recognition, for the US Air Force (USAF). This is in sharp contrast to the other algorithms in industry that require thousands to billions of samples, being trained on large GPU servers. "ZAC requires much less computing power and much less electrical power to run, which is great for mobile and edge computing, as well as environment, with less Carbon footprint," emphasized Dr. Saied Tadayon, CTO of ZAC. ZAC is the first to demonstrate the novel and superior algorithms Cognition-based Explainable-AI (XAI), where various attributes and details of 3D (three dimensional) objects are recognized from any view or angle. "You cannot do this task with the other algorithms, such as Deep Convolutional Neural Networks (CNN) or ResNets, even with an extremely large number of training samples, on GPU servers. That's basically hitting the limitations of CNNs or Neural Nets, which all other companies are using now," said Dr. Bijan Tadayon, CEO of ZAC.


Lip-Reading AI is Under Development, Under Watchful Eyes - AI Trends

#artificialintelligence

A lip-reading app from Irish startup Liopa is said to represent a breakthrough in the field of visual speech recognition (VSR), which trains AI to read lips without any audio input. Liopa's product, SRAVI (Speech Recognition App for the Voice Impaired) is a communication aid for speech-impaired patients. It is likely to be the first lip-reading AI app available for public purchase, according to an account from Vice/Motherboard. Researchers driven by a range of potential commercial applications including surveillance tools have been working for years to teach computers to lip-read, and it has proven a challenging task. Liopa is working to certify SRAVI as a Class I medical device in Europe, hoping to complete the certification by August.


ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX

arXiv.org Artificial Intelligence

Tables present important information concisely in many scientific documents. Visual features like mathematical symbols, equations, and spanning cells make structure and content extraction from tables embedded in research documents difficult. This paper discusses the dataset, tasks, participants' methods, and results of the ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX. Specifically, the task of the competition is to convert a tabular image to its corresponding LaTeX source code. We proposed two subtasks. In Subtask 1, we ask the participants to reconstruct the LaTeX structure code from an image. In Subtask 2, we ask the participants to reconstruct the LaTeX content code from an image. This report describes the datasets and ground truth specification, details the performance evaluation metrics used, presents the final results, and summarizes the participating methods. Submission by team VCGroup got the highest Exact Match accuracy score of 74% for Subtask 1 and 55% for Subtask 2, beating previous baselines by 5% and 12%, respectively. Although improvements can still be made to the recognition capabilities of models, this competition contributes to the development of fully automated table recognition systems by challenging practitioners to solve problems under specific constraints and sharing their approaches; the platform will remain available for post-challenge submissions at https://competitions.codalab.org/competitions/26979 .


How image search works at Dropbox

#artificialintelligence

Image classification lets us automatically understand what's in an image, but by itself this isn't enough to enable search. Sure, if a user searches for beach we could return the images with the highest scores for that category, but what if they instead search for shore? What if instead of apple they search for fruit or granny smith? We could collate a large dictionary of synonyms and near-synonyms and hierarchical relationships between words, but this quickly becomes unwieldy, especially if we support multiple languages. Word vectors So let's reframe the problem.


Image Search -- Transfer Learning with CNN (Convolutional Neural Network)

#artificialintelligence

To build an Image Search Engine that retrieves the most similar images from the database based on specific target images. Given a query image (containing a specific instance) and a collection of images with different contents, we want to find the images that contain the same query instance from the collection. The below images are two examples of query images (original cropped). The image below is the query result using ResNet transfer learning. Since I have ten query images, there are ten rows of images, with each row containing the ten most similar images to the query image.


Image Recognition AI: Algorithms And Applications

#artificialintelligence

Image Recognition AI: Algorithms And Applications Machine learning began with humans feeding information to the computer through the usage of keyboards for them to understand and develop certain learned patterns. This process relied heavily on the ability of the human to enter the correct information and help the computer develop its patterns. This breakthrough does not really require someone to feed the information to the computer or be their eyes so to say. Because this new technique allows machines to interpret and categorize whatever they see in images or videos. In other words, computers now have their own eyes.


Deep Residual Learning for Image Recognition (2015)

#artificialintelligence

Short summaries (1–2 minutes reading time) to help you (and me) understand and remember important papers/concepts about machine learning and related topics. "If you can't explain is simply, you don't understand it well enough" -- Einstein, maybe.


RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

arXiv.org Artificial Intelligence

We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers. Compared to convolutional layers, FC layers are more efficient, better at modeling the long-range dependencies and positional patterns, but worse at capturing the local structures, hence usually less favored for image recognition. We propose a structural re-parameterization technique that adds local prior into an FC to make it powerful for image recognition. Specifically, we construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. On CIFAR, a simple pure-MLP model shows performance very close to CNN. By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs. Our intriguing findings highlight that combining the global representational capacity and positional perception of FC with the local prior of convolution can improve the performance of neural network with faster speed on both the tasks with translation invariance (e.g., semantic segmentation) and those with aligned images and positional patterns (e.g., face recognition). The code and models are available at https://github.com/DingXiaoH/RepMLP.


Attention for Image Registration (AiR): an unsupervised Transformer approach

arXiv.org Artificial Intelligence

Image registration as an important basis in signal processing task often encounter the problem of stability and efficiency. Non-learning registration approaches rely on the optimization of the similarity metrics between the fix and moving images. Yet, those approaches are usually costly in both time and space complexity. The problem can be worse when the size of the image is large or the deformations between the images are severe. Recently, deep learning, or precisely saying, the convolutional neural network (CNN) based image registration methods have been widely investigated in the research community and show promising effectiveness to overcome the weakness of non-learning based methods. To explore the advanced learning approaches in image registration problem for solving practical issues, we present in this paper a method of introducing attention mechanism in deformable image registration problem. The proposed approach is based on learning the deformation field with a Transformer framework (AiR) that does not rely on the CNN but can be efficiently trained on GPGPU devices also. In a more vivid interpretation: we treat the image registration problem as the same as a language translation task and introducing a Transformer to tackle the problem. Our method learns an unsupervised generated deformation map and is tested on two benchmark datasets. The source code of the AiR will be released at Gitlab.


PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex

arXiv.org Artificial Intelligence

Recognizing a table image into a Latex code is challenging due to complexity and diversity of table structures and long sequence problem compared to traditional OCR. The challenge aims at assessing the ability of state-of-the-art methods to recognize scientific tables into LaTeX codes. In this competition, there are two sub-tasks with different levels of difficulty. Subtask I Table Structure Reconstruction is to reconstruct the structure of a table image into the form of LaTeX code but ignore the content of the table. Subtask II Table Content Reconstruction is to reconstruct the structure and the content of a table image simultaneously into the form of LaTeX code.