Goto

Collaborating Authors

 Pattern Recognition


Europe 2021 Detailed Schedule

#artificialintelligence

Prof. Zheng-Hua Tan is a Professor of Machine Learning and Speech Processing, a Co-Head of the Centre for Acoustic Signal Processing Research (CASPR), and Machine Learning Research Group Leader in the Department of Electronic Systems at Aalborg University, Denmark. Prof. Zheng-Hua Tan was a Visiting Scientist/Professor at the Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, USA, an Associate Professor in the Department of Electronic Engineering at Shanghai Jiao Tong University, China, and a postdoctoral fellow at AI Spoken Language Lab, in the Department of Computer Science at KAIST, Korea. He received the B.S. and M.S. degrees in electrical engineering from Hunan University, China, in 1990 and 1996, respectively, and the Ph.D. degree in electronic engineering from Shanghai Jiao Tong University, China, in 1999. His research interests include machine learning, deep learning, pattern recognition, speech and speaker recognition, noise-robust speech processing, multimodal signal processing, and social robotics. He has over 200 publications.


ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX

arXiv.org Artificial Intelligence

Tables present important information concisely in many scientific documents. Visual features like mathematical symbols, equations, and spanning cells make structure and content extraction from tables embedded in research documents difficult. This paper discusses the dataset, tasks, participants' methods, and results of the ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX. Specifically, the task of the competition is to convert a tabular image to its corresponding LaTeX source code. We proposed two subtasks. In Subtask 1, we ask the participants to reconstruct the LaTeX structure code from an image. In Subtask 2, we ask the participants to reconstruct the LaTeX content code from an image. This report describes the datasets and ground truth specification, details the performance evaluation metrics used, presents the final results, and summarizes the participating methods. Submission by team VCGroup got the highest Exact Match accuracy score of 74% for Subtask 1 and 55% for Subtask 2, beating previous baselines by 5% and 12%, respectively. Although improvements can still be made to the recognition capabilities of models, this competition contributes to the development of fully automated table recognition systems by challenging practitioners to solve problems under specific constraints and sharing their approaches; the platform will remain available for post-challenge submissions at https://competitions.codalab.org/competitions/26979 .


How image search works at Dropbox

#artificialintelligence

Image classification lets us automatically understand what's in an image, but by itself this isn't enough to enable search. Sure, if a user searches for beach we could return the images with the highest scores for that category, but what if they instead search for shore? What if instead of apple they search for fruit or granny smith? We could collate a large dictionary of synonyms and near-synonyms and hierarchical relationships between words, but this quickly becomes unwieldy, especially if we support multiple languages. Word vectors So let's reframe the problem.


Image Search -- Transfer Learning with CNN (Convolutional Neural Network)

#artificialintelligence

To build an Image Search Engine that retrieves the most similar images from the database based on specific target images. Given a query image (containing a specific instance) and a collection of images with different contents, we want to find the images that contain the same query instance from the collection. The below images are two examples of query images (original cropped). The image below is the query result using ResNet transfer learning. Since I have ten query images, there are ten rows of images, with each row containing the ten most similar images to the query image.


How to detect online trends without web scraping

#artificialintelligence

To get text information from the content of each screenshot, we will apply text recognition from these images. Our goal is not only to obtain the words used on the page but also their weights (understood as a measure of their relevance or importance). Thanks to that, we will be able to generate a word cloud, where word size will signal how exposed a word was on the site. Pytesseract is an optical character recognition (OCR) tool for python. It will recognize and "read" the text embedded in screenshots.


Image Recognition AI: Algorithms And Applications

#artificialintelligence

Image Recognition AI: Algorithms And Applications Machine learning began with humans feeding information to the computer through the usage of keyboards for them to understand and develop certain learned patterns. This process relied heavily on the ability of the human to enter the correct information and help the computer develop its patterns. This breakthrough does not really require someone to feed the information to the computer or be their eyes so to say. Because this new technique allows machines to interpret and categorize whatever they see in images or videos. In other words, computers now have their own eyes.


Deep Residual Learning for Image Recognition (2015)

#artificialintelligence

Short summaries (1–2 minutes reading time) to help you (and me) understand and remember important papers/concepts about machine learning and related topics. "If you can't explain is simply, you don't understand it well enough" -- Einstein, maybe.


Leveraging Sparse Linear Layers for Debuggable Deep Networks

arXiv.org Machine Learning

As machine learning (ML) models find wide-spread application, there is a growing demand for interpretability: access to tools that help people see why the model made its decision. There are still many obstacles towards achieving this goal though, particularly in the context of deep learning. These obstacles stem from the scale of modern deep networks, as well as the complexity of even defining and assessing the (often context-dependent) desiderata of interpretability. Existing work on deep network interpretability has largely approached this problem from two perspectives. The first one seeks to uncover the concepts associated with specific neurons in the network, for example through visualization [Yos 15] or semantic labeling [Bau 17].


PSEUDo: Interactive Pattern Search in Multivariate Time Series with Locality-Sensitive Hashing and Relevance Feedback

arXiv.org Artificial Intelligence

We present PSEUDo, an adaptive feature learning technique for exploring visual patterns in multi-track sequential data. Our approach is designed with the primary focus to overcome the uneconomic retraining requirements and inflexible representation learning in current deep learning-based systems. Multi-track time series data are generated on an unprecedented scale due to increased sensors and data storage. These datasets hold valuable patterns, like in neuromarketing, where researchers try to link patterns in multivariate sequential data from physiological sensors to the purchase behavior of products and services. But a lack of ground truth and high variance make automatic pattern detection unreliable. Our advancements are based on a novel query-aware locality-sensitive hashing technique to create a feature-based representation of multivariate time series windows. Most importantly, our algorithm features sub-linear training and inference time. We can even accomplish both the modeling and comparison of 10,000 different 64-track time series, each with 100 time steps (a typical EEG dataset) under 0.8 seconds. This performance gain allows for a rapid relevance feedback-driven adaption of the underlying pattern similarity model and enables the user to modify the speed-vs-accuracy trade-off gradually. We demonstrate superiority of PSEUDo in terms of efficiency, accuracy, and steerability through a quantitative performance comparison and a qualitative visual quality comparison to the state-of-the-art algorithms in the field. Moreover, we showcase the usability of PSEUDo through a case study demonstrating our visual pattern retrieval concepts in a large meteorological dataset. We find that our adaptive models can accurately capture the user's notion of similarity and allow for an understandable exploratory visual pattern retrieval in large multivariate time series datasets.


RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

arXiv.org Artificial Intelligence

We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers. Compared to convolutional layers, FC layers are more efficient, better at modeling the long-range dependencies and positional patterns, but worse at capturing the local structures, hence usually less favored for image recognition. We propose a structural re-parameterization technique that adds local prior into an FC to make it powerful for image recognition. Specifically, we construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. On CIFAR, a simple pure-MLP model shows performance very close to CNN. By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs. Our intriguing findings highlight that combining the global representational capacity and positional perception of FC with the local prior of convolution can improve the performance of neural network with faster speed on both the tasks with translation invariance (e.g., semantic segmentation) and those with aligned images and positional patterns (e.g., face recognition). The code and models are available at https://github.com/DingXiaoH/RepMLP.