"Computers have been getting better and better at seeing movement on video. How is it that they read lips, follow a dancing girl or copy an actor making faces?"
– from Andrew Blake. Introduction to Active Contours and Visual Dynamics. Visual Dynamics Group, Department of Engineering Science, University of Oxford
In today's world, Computer Vision technologies are everywhere. They are embedded within many of the tools and applications that we use on a daily basis. However, we often pay little attention to those underlaying Computer Vision technologies because they tend to run in the background. As a result, only a small fraction of those outside the tech industries know about the importance of those technologies. Therefore, the goal of this article is to provide an overview of Computer Vision to those with little to no knowledge about the field. I attempt to achieve this goal by answering three questions: What is Computer Vision?, Why should you learn Computer Vision? and How you can get started?
Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.
Earlier facial recognition technology was considered as an idea of science fiction. But in the past decade, facial recognition technology has not only become real -- but it's widespread. Today, people can easily read articles and news stories about facial recognition everywhere. Here is the history of facial recognition technology and some ideas about its bright future. Facial recognition technology along with AI (Artificial Intelligence) and Deep Learning (DL) technology are benefiting several industries.
A growing body of work shows that many problems in fairness, accountability, transparency, and ethics in machine learning systems are rooted in decisions surrounding the data collection and annotation process. In spite of its fundamental nature however, data collection remains an overlooked part of the machine learning (ML) pipeline. In this paper, we argue that a new specialization should be formed within ML that is focused on methodologies for data collection and annotation: efforts that require institutional frameworks and procedures. Specifically for sociocultural data, parallels can be drawn from archives and libraries. Archives are the longest standing communal effort to gather human information and archive scholars have already developed the language and procedures to address and discuss many challenges pertaining to data collection such as consent, power, inclusivity, transparency, and ethics & privacy. We discuss these five key approaches in document collection practices in archives that can inform data collection in sociocultural ML. By showing data collection practices from another field, we encourage ML research to be more cognizant and systematic in data collection and draw from interdisciplinary expertise.
With the rapid growth of the applications of machine learning (ML) and other artificial intelligence (AI) techniques, adequate testing has become a necessity to ensure their quality. This paper identifies the characteristics of AI applications that distinguish them from traditional software, and analyses the main difficulties in applying existing testing methods. Based on this analysis, we propose a new method called datamorphic testing and illustrate the method with an example of testing face recognition applications. We also report an experiment with four real industrial application systems of face recognition to validate the proposed approach.
Ethnic group classification is a well-researched problem, which has been pursued mainly during the past two decades via traditional approaches of image processing and machine learning. In this paper, we propose a method of classifying an image face into an ethnic group by applying transfer learning from a previously trained classification network for large-scale data recognition. Our proposed method yields state-of- the-art success rates of 99.02%, 99.76%, 99.2%, and 96.7%, respectively, for the four ethnic groups: African, Asian, Caucasian, and Indian. 1 Introduction Ethnic classification from facial images has been studied for the past two decades with the purpose of understanding how humans perceive and determine an ethnic group from a given image. The motivation stems, for example, from the fact that (gender and) ethnicity play an important role in face-related applications, such as advertising, social insensitive-based systems, etc. Furthermore, while facial features are subject to change (due to aging, for example), ethnicity is of interest due to its invariance over time. Recent works on demographic classification are divided conceptually into appearancebased methods (using, e.g., eigenface methods, fisherface methods, etc.) and geometry-based methods (relying, e.g., on geometric parameters, such as the distance between the eyes, face width and length, nose thickness, etc.). One of the main challenges of automatic demographic classification is to avoid any "noise", such as illumination, background distortion, and a subject's pose. In this paper, we introduce a deep learning-based method, that achieves state-of-the-art results for facial image representations and classification for the four ethnic groups: African, Asian, Caucasian, and Indian. 2 Related Work 2.1 Traditional ML-Based Techniques During the past two decades, there has been enormous progress on the topic of ethnic group classification, using various classical Machine Learning methods.
The usability and practicality of any machine learning (ML) applications are largely influenced by two critical but hard-to-attain factors: low latency and low cost. Unfortunately, achieving low latency and low cost is very challenging when ML depends on real-world data that are highly distributed and rapidly growing (e.g., data collected by mobile phones and video cameras all over the world). Such real-world data pose many challenges in communication and computation. For example, when training data are distributed across data centers that span multiple continents, communication among data centers can easily overwhelm the limited wide-area network bandwidth, leading to prohibitively high latency and high cost. In this dissertation, we demonstrate that the latency and cost of ML on highly-distributed and rapidly-growing data can be improved by one to two orders of magnitude by designing ML systems that exploit the characteristics of ML algorithms, ML model structures, and ML training/serving data. We support this thesis statement with three contributions. First, we design a system that provides both low-latency and low-cost ML serving (inferencing) over large-scale and continuously-growing datasets, such as videos. Second, we build a system that makes ML training over geo-distributed datasets as fast as training within a single data center. Third, we present a first detailed study and a system-level solution on a fundamental and largely overlooked problem: ML training over non-IID (i.e., not independent and identically distributed) data partitions (e.g., facial images collected by cameras varies according to the demographics of each camera's location).
Union of Subspaces (UoS) model serves as an important model i n statistical machine learning. Briefly speaking, UoS models those high-dimensional da ta, encountered in many real-world problems, which lie close to low-dimensional subspaces corresponding to several classes to which the data belong, such as handwritten digits (Hasti e and Simard, 1998), face images (Basri and Jacobs, 2003), DNA microarray data (Parvare sh et al., 2008), and hyper-spectral images (Chen et al., 2011), to name just a few. A fund amental task in processing data points in UoS is to cluster these data points, which is kn own as Subspace Clustering (SC). Applications of SC has spanned all over science and eng ineering, including motion segmentation (Costeira and Kanade, 1998; Kanatani, 2001), face recognition (Wright et al., 2008), and classification of diseases (McWilliams and Monta na, 2014) and so on. We refer the reader to the tutorial paper (Vidal, 2011) for a review of the development of SC. The authors are with Department of Electronic Engineering, Tsinghua University, Beijing 100084, China. The corresponding author of this paper is Y. Gu (gyt@tsinghu a.edu.cn).
This paper targets the problem of image set-based face verification and identification. Unlike traditional single media (an image or video) setting, we encounter a set of heterogeneous contents containing orderless images and videos. The importance of each image is usually considered either equal or based on their independent quality assessment. How to model the relationship of orderless images within a set remains a challenge. We address this problem by formulating it as a Markov Decision Process (MDP) in the latent space. Specifically, we first present a dependency-aware attention control (DAC) network, which resorts to actor-critic reinforcement learning for sequential attention decision of each image embedding to fully exploit the rich correlation cues among the unordered images. Moreover, we introduce its sample-efficient variant with off-policy experience replay to speed up the learning process. The pose-guided representation scheme can further boost the performance at the extremes of the pose variation.
Machine learning (ML) can help you create innovative, compelling and unique experiences for your mobile users. Once you've mastered ML, you can use it to create a wide range of applications, including apps that automatically organize photos based on their subject matter, identify and track a person's face across a livestream, extract text from an image, and much more. If you want to enhance your Android apps with powerful machine learning capabilities, then where exactly do you start? In this article, I'll provide an overview of an SDK (Software Development Kit) that promises to put the power of ML at your fingertips, even if you have zero ML experience. By the end of this article, you'll have the foundation you need to start creating intelligent, ML-powered apps that are capable of labelling images, scanning barcodes, recognizing faces and famous landmarks, and performing many other powerful ML tasks.