Collaborating Authors


Synthetic Data: How AI Uses Fake Data for Genuine Gains


The most remarkable thing about shopping in a smart store is how unremarkable it feels -- once you get over the whole no-checkout thing, of course. At the same time, there's a whole mess of complex computer vision modeling going on all around you to facilitate the experience. Those models have to be sophisticated enough to handle all manner of image challenges: object recognition, activity recognition, pose estimation. Asking a system to differentiate between two similar bags of chips, grabbed by people in similar coats and gloves, with no margin for error, is a big ask. Getting it right requires data.

Computer Vision Bootcamp with Python (OpenCV) - YOLO, SSD


Description This course is about the fundamental concept of image processing, focusing on face detection and object detection. These topics are getting very hot nowadays because these learning algorithms can be used in several fields from software engineering to crime investigation. Self-driving cars (for example lane detection approaches) relies heavily on computer vision. With the advent of deep learning and graphical processing units (GPUs) in the past decade it's become possible to run these algorithms even in real-time videos. So what are you going to learn in this course?

The Future of AI Part 1


It was reported that Venture Capital investments into AI related startups made a significant increase in 2018, jumping by 72% compared to 2017, with 466 startups funded from 533 in 2017. PWC moneytree report stated that that seed-stage deal activity in the US among AI-related companies rose to 28% in the fourth-quarter of 2018, compared to 24% in the three months prior, while expansion-stage deal activity jumped to 32%, from 23%. There will be an increasing international rivalry over the global leadership of AI. President Putin of Russia was quoted as saying that "the nation that leads in AI will be the ruler of the world". Billionaire Mark Cuban was reported in CNBC as stating that "the world's first trillionaire would be an AI entrepreneur".

Computational Needs for Computer Vision (CV) in AI & ML Systems


Computer vision (CV) is a major task for modern Artificial Intelligence (AI) and Machine Learning (ML) systems. It's accelerating nearly every domain in the tech industry enabling organizations to revolutionize the way machines and business systems work. Academically, it is a well-established area of computer science and many decades worth of research work have gone into this field. However, the use of deep neural networks has recently revolutionized the CV field and given it new oxygen. There is a diverse array of application areas for computer vision.

Contextual Semantic Interpretability Artificial Intelligence

Convolutional neural networks (CNN) are known to learn an image representation that captures concepts relevant to the task, but do so in an implicit way that hampers model interpretability. However, one could argue that such a representation is hidden in the neurons and can be made explicit by teaching the model to recognize semantically interpretable attributes that are present in the scene. We call such an intermediate layer a \emph{semantic bottleneck}. Once the attributes are learned, they can be re-combined to reach the final decision and provide both an accurate prediction and an explicit reasoning behind the CNN decision. In this paper, we look into semantic bottlenecks that capture context: we want attributes to be in groups of a few meaningful elements and participate jointly to the final decision. We use a two-layer semantic bottleneck that gathers attributes into interpretable, sparse groups, allowing them contribute differently to the final output depending on the context. We test our contextual semantic interpretable bottleneck (CSIB) on the task of landscape scenicness estimation and train the semantic interpretable bottleneck using an auxiliary database (SUN Attributes). Our model yields in predictions as accurate as a non-interpretable baseline when applied to a real-world test set of Flickr images, all while providing clear and interpretable explanations for each prediction.

Fairness Matters -- A Data-Driven Framework Towards Fair and High Performing Facial Recognition Systems Artificial Intelligence

Facial recognition technologies are widely used in governmental and industrial applications. Together with the advancements in deep learning (DL), human-centric tasks such as accurate age prediction based on face images become feasible. However, the issue of fairness when predicting the age for different ethnicity and gender remains an open problem. Policing systems use age to estimate the likelihood of someone to commit a crime, where younger suspects tend to be more likely involved. Unfair age prediction may lead to unfair treatment of humans not only in crime prevention but also in marketing, identity acquisition and authentication. Therefore, this work follows two parts. First, an empirical study is conducted evaluating performance and fairness of state-of-the-art systems for age prediction including baseline and most recent works of academia and the main industrial service providers (Amazon AWS and Microsoft Azure). Building on the findings we present a novel approach to mitigate unfairness and enhance performance, using distribution-aware dataset curation and augmentation. Distribution-awareness is based on out-of-distribution detection which is utilized to validate equal and diverse DL system behavior towards e.g. ethnicity and gender. In total we train 24 DNN models and utilize one million data points to assess performance and fairness of the state-of-the-art for face recognition algorithms. We demonstrate an improvement in mean absolute age prediction error from 7.70 to 3.39 years and a 4-fold increase in fairness towards ethnicity when compared to related work. Utilizing the presented methodology we are able to outperform leading industry players such as Amazon AWS or Microsoft Azure in both fairness and age prediction accuracy and provide the necessary guidelines to assess quality and enhance face recognition systems based on DL techniques.

Light Can Hack Your Face! Black-box Backdoor Attack on Face Recognition Systems Artificial Intelligence

Deep neural networks (DNN) have shown great success in many computer vision applications. However, they are also known to be susceptible to backdoor attacks. When conducting backdoor attacks, most of the existing approaches assume that the targeted DNN is always available, and an attacker can always inject a specific pattern to the training data to further fine-tune the DNN model. However, in practice, such attack may not be feasible as the DNN model is encrypted and only available to the secure enclave. In this paper, we propose a novel black-box backdoor attack technique on face recognition systems, which can be conducted without the knowledge of the targeted DNN model. To be specific, we propose a backdoor attack with a novel color stripe pattern trigger, which can be generated by modulating LED in a specialized waveform. We also use an evolutionary computing strategy to optimize the waveform for backdoor attack. Our backdoor attack can be conducted in a very mild condition: 1) the adversary cannot manipulate the input in an unnatural way (e.g., injecting adversarial noise); 2) the adversary cannot access the training database; 3) the adversary has no knowledge of the training model as well as the training set used by the victim party. We show that the backdoor trigger can be quite effective, where the attack success rate can be up to $88\%$ based on our simulation study and up to $40\%$ based on our physical-domain study by considering the task of face recognition and verification based on at most three-time attempts during authentication. Finally, we evaluate several state-of-the-art potential defenses towards backdoor attacks, and find that our attack can still be effective. We highlight that our study revealed a new physical backdoor attack, which calls for the attention of the security issue of the existing face recognition/verification techniques.

Adaptive Label Smoothing Artificial Intelligence

This paper concerns the use of objectness measures to improve the calibration performance of Convolutional Neural Networks (CNNs). Objectness is a measure of likelihood of an object from any class being present in a given image. CNNs have proven to be very good classifiers and generally localize objects well; however, the loss functions typically used to train classification CNNs do not penalize inability to localize an object, nor do they take into account an object's relative size in the given image. We present a novel approach to object localization that combines the ideas of objectness and label smoothing during training. Unlike previous methods, we compute a smoothing factor that is adaptive based on relative object size within an image. We present extensive results using ImageNet and OpenImages to demonstrate that CNNs trained using adaptive label smoothing are much less likely to be overconfident in their predictions, as compared to CNNs trained using hard targets. We also show qualitative results using class activation maps to illustrate the improvements.

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design Artificial Intelligence

The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14$\times$ compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5$\times$ speedup.

Embedded Development Boards for Edge-AI: A Comprehensive Report Artificial Intelligence

The use of Deep Learning and Machine Learning is becoming pervasive day by day which is opening doors to new opportunities in every aspect of technology. Its application Ranges from Health-care to Self-driving Cars, Home Automation to Smart-agriculture, and Industry 4.0. Traditionally the majority of the processing for IoT applications is being done on a central cloud but that has its issues; which include latency, security, bandwidth, and privacy, etc. It is estimated that there will be around 20 Million IoT devices by 2020 which will increase problems with sending data to the cloud and doing the processing there. A new trend of processing the data on the edge of the network is emerging. The idea is to do processing as near the point of data production as possible. Doing processing on the nodes generating the data is called Edge Computing and doing processing on a layer between the cloud and the point of data production is called Fog computing. There are no standard definitions for any of these, hence they are usually used interchangeably. In this paper, we have reviewed the development boards available for running Artificial Intelligence algorithms on the Edge