Pattern Recognition
Which company does the best job at image recognition? Microsoft, Amazon, Google, or IBM? ZDNet
Sometimes recognition software is excellent at correctly categorizing certain types of images but totally fails with others. Some image recognition engines prefer cats over dogs, and some are far more descriptive with their color knowledge. But which is the best overall? Perficient Digital's image recognition accuracy study looked at image recognition -- one of the hottest areas of machine learning. It looked at Amazon AWS Rekognition, Google Vision, IBM Watson, and Microsoft Azure Computer Vision to compare images.
Machine Learning at the Network Edge: A Survey
Murshed, M. G. Sarwar, Murphy, Christopher, Hou, Daqing, Khan, Nazar, Ananthanarayanan, Ganesh, Hussain, Faraz
Devices comprising the Internet of Things, such as sensors and small cameras, usually have small memories and limited computational power. The proliferation of such resource-constrained devices in recent years has led to the generation of large quantities of data. These data-producing devices are appealing targets for machine learning applications but struggle to run machine learning algorithms due to their limited computing capability. They typically offload input data to external computing systems (such as cloud servers) for further processing. The results of the machine learning computations are communicated back to the resource-scarce devices, but this worsens latency, leads to increased communication costs, and adds to privacy concerns. Therefore, efforts have been made to place additional computing devices at the edge of the network, i.e close to the IoT devices where the data is generated. Deploying machine learning systems on such edge devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning has been deployed at the edge of computer networks.
Artificial intelligence can now pick stocks and build portfolios. Are human managers about to be replaced? The Chronicle Herald
Outside of their ability to understand a company's fundamentals, one of the skills Raj Lala appreciates most about his portfolio managers is their ability to interpret body language. Sitting across from management teams before making a decision to either invest or divest from their companies, Lala, the CEO of Evolve ETFs, said his portfolio managers can learn a lot from simply reading the room. Maybe they spot a nervous twitch after a question on guidance or a CEO unable to make eye contact when responding to a question about declining revenues. That very human capability was at the forefront of Lala's mind when he was recently pitched on two types of artificial intelligence that he could incorporate into his portfolio management processes. And it's one of the reasons he said no. "I can't see AI getting to that point where it replaces human interaction and, quite honestly, I would say god bless our world if that's the case," Lala said.
A Deep Neural Network for Short-Segment Speaker Recognition
Hajavi, Amirhossein, Etemad, Ali
Today's interactive devices such as smart-phone assistants and smart speakers often deal with short-duration speech segments. As a result, speaker recognition systems integrated into such devices will be much better suited with models capable of performing the recognition task with short-duration utterances. In this paper, a new deep neural network, UtterIdNet, capable of performing speaker recognition with short speech segments is proposed. Our proposed model utilizes a novel architecture that makes it suitable for short-segment speaker recognition through an efficiently increased use of information in short speech segments. UtterIdNet has been trained and tested on the V oxCeleb datasets, the latest benchmarks in speaker recognition. Evaluations for different segment durations show consistent and stable performance for short segments, with significant improvement over the previous models for segments of 2 seconds, 1 second, and especially sub-second durations (250 ms and 500 ms).
Adversarial Examples to Fool Iris Recognition Systems
Soleymani, Sobhan, Dabouei, Ali, Dawson, Jeremy, Nasrabadi, Nasser M.
Adversarial examples have recently proven to be able to fool deep learning methods by adding carefully crafted small perturbation to the input space image. In this paper, we study the possibility of generating adversarial examples for code-based iris recognition systems. Since generating adversarial examples requires back-propagation of the adversarial loss, conventional filter bank-based iris-code generation frameworks cannot be employed in such a setup. Therefore, to compensate for this shortcoming, we propose to train a deep auto-encoder surrogate network to mimic the conventional iris code generation procedure. This trained surrogate network is then deployed to generate the adversarial examples using the iterative gradient sign method algorithm. We consider non-targeted and targeted attacks through three attack scenarios. Considering these attacks, we study the possibility of fooling an iris recognition system in white-box and black-box frameworks.
Robustness properties of Facebook's ResNeXt WSL models
We investigate the robustness properties of ResNeXt image recognition models trained with billion scale weakly-supervised data (ResNeXt WSL models). These models, recently made public by Facebook AI, were trained on 1B images from Instagram and fine-tuned on ImageNet. We show that these models display an unprecedented degree of robustness against common image corruptions and perturbations, as measured by the ImageNet-C and ImageNet-P benchmarks. The largest of the released models, in particular, achieves state-of-the-art results on both ImageNet-C and ImageNet-P by a large margin. The gains on ImageNet-C and ImageNet-P far outpace the gains on ImageNet validation accuracy, suggesting the former as more useful benchmarks to measure further progress in image recognition. Remarkably, the ResNeXt WSL models even achieve a limited degree of adversarial robustness against state-of-the-art white-box attacks (10-step PGD attacks). However, in contrast to adversarially trained models, the robustness of the ResNeXt WSL models rapidly declines with the number of PGD steps, suggesting that these models do not achieve genuine adversarial robustness. Visualization of the learned features also confirms this conclusion. Finally, we show that although the ResNeXt WSL models are more shape-biased than comparable ImageNet-trained models in a shape-texture cue conflict experiment, they still remain much more texture-biased than humans and their accuracy on the recently introduced "natural adversarial examples" (ImageNet-A) also remains low, suggesting that they share many of the underlying characteristics of ImageNet-trained models that make these benchmarks challenging.
WorkShop: Machine Learning for the Enterprise International Conference 2019 - Technology Transfer
Machine Learning (ML) represents a massive change in the computing industry. It is a long-term trend that offers the potential for significant advantages for many enterprises. Accurate prediction is critical for practically all enterprises. Without a degree of confidence in business forecasting, organizations would have a difficult time delivering successful products and services in a cost-effective manner. Machine Learning provides the capability to offer deep predictive and prescriptive decision-making intelligence.
Toward Fairness in AI for People with Disabilities: A Research Roadmap
Guo, Anhong, Kamar, Ece, Vaughan, Jennifer Wortman, Wallach, Hanna, Morris, Meredith Ringel
AI technologies have the potential to dramatically impact the lives of people with disabilities (PWD). Indeed, improving the lives of PWD is a motivator for many state-of-the-art AI systems, such as automated speech recognition tools that can caption videos for people who are deaf and hard of hearing, or language prediction algorithms that can augment communication for people with speech or cognitive disabilities. However, widely deployed AI systems may not work properly for PWD, or worse, may actively discriminate against them. These considerations regarding fairness in AI for PWD have thus far received little attention. In this position paper, we identify potential areas of concern regarding how several AI technology categories may impact particular disability constituencies if care is not taken in their design, development, and testing. We intend for this risk assessment of how various classes of AI might interact with various classes of disability to provide a roadmap for future research that is needed to gather data, test these hypotheses, and build more inclusive algorithms.
Unsupervised Deformable Image Registration Using Cycle-Consistent CNN
Kim, Boah, Kim, Jieun, Lee, June-Goo, Kim, Dong Hwan, Park, Seong Ho, Ye, Jong Chul
Medical image registration is one of the key processing steps for biomedical image analysis such as cancer diagnosis. Recently, deep learning based supervised and unsupervised image registration methods have been extensively studied due to its excellent performance in spite of ultra-fast computational time compared to the classical approaches. In this paper, we present a novel unsupervised medical image registration method that trains deep neural network for deformable registration of 3D volumes using a cycle-consistency. Thanks to the cycle consistency, the proposed deep neural networks can take diverse pair of image data with severe deformation for accurate registration. Experimental results using multiphase liver CT images demonstrate that our method provides very precise 3D image registration within a few seconds, resulting in more accurate cancer size estimation.
Canonical Correlation Analysis (CCA) Based Multi-View Learning: An Overview
Multi-view learning (MVL) is a strategy for fusing data from different sources or subsets. Canonical correlation analysis (CCA) is very important in MVL, whose main idea is to map data from different views onto a common space with the maximum correlation. The traditional CCA can only be used to calculate the linear correlation between two views. Moreover, it is unsupervised, and the label information is wasted in supervised learning tasks. Many nonlinear, supervised, or generalized extensions have been proposed to overcome these limitations. However, to our knowledge, there is no up-to-date overview of these approaches. This paper fills this gap, by providing a comprehensive overview of many classical and latest CCA approaches, and describing their typical applications in pattern recognition, multi-modal retrieval and classification, and multi-view embedding.