AITopics

2011.11286

Country: North America > United States (0.29)

Genre: Research Report (0.70)

Industry: Media > News (0.90)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.95)

arXiv.org Artificial IntelligenceJun-6-2019

Does Generative Face Completion Help Face Recognition?

Mathai, Joe, Masi, Iacopo, AbdAlmageed, Wael

Face occlusions, covering either the majority or discriminative parts of the face, can break facial perception and produce a drastic loss of information. Biometric systems such as recent deep face recognition models are not immune to obstructions or other objects covering parts of the face. While most of the current face recognition methods are not optimized to handle occlusions, there have been a few attempts to improve robustness directly in the training stage. Unlike those, we propose to study the effect of generative face completion on the recognition. We offer a face completion encoder-decoder, based on a convolutional operator with a gating mechanism, trained with an ample set of face occlusions. To systematically evaluate the impact of realistic occlusions on recognition, we propose to play the occlusion game: we render 3D objects onto different face parts, providing precious knowledge of what the impact is of effectively removing those occlusions. Extensive experiments on the Labeled Faces in the Wild (LFW), and its more difficult variant LFW-BLUFR, testify that face completion is able to partially restore face perception in machine vision systems for improved recognition.

deep learning, neural network, occlusion, (20 more...)

1906.02858

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.66)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningMay-7-2019

Unified Adversarial Invariance

Jaiswal, Ayush, Wu, Yue, AbdAlmageed, Wael, Natarajan, Premkumar

We present a unified invariance framework for supervised neural networks that can induce independence to nuisance factors of data without using any nuisance annotations, but can additionally use labeled information about biasing factors to force their removal from the latent embedding for making fair predictions. Invariance to nuisance is achieved by learning a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, whereas that to biasing factors is brought about by penalizing the network if the latent embedding contains any information about them. We describe an adversarial instantiation of this framework and provide analysis of its working. Our model outperforms previous works at inducing invariance to nuisance factors without using any labeled information about such variables, and achieves state-of-the-art performance at learning independence to biasing factors in fairness settings.

information, neural network, us government, (17 more...)

1905.03629

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningSep-26-2018

Unsupervised Adversarial Invariance

Jaiswal, Ayush, Wu, Yue, AbdAlmageed, Wael, Natarajan, Premkumar

Data representations that contain all the information about target variables but are invariant to nuisance factors benefit supervised learning algorithms by preventing them from learning associations between these factors and the targets, thus reducing overfitting. We present a novel unsupervised invariance induction framework for neural networks that learns a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, without needing any labeled information about nuisance factors or domain knowledge. We describe an adversarial instantiation of this framework and provide analysis of its working. Our unsupervised model outperforms state-of-the-art methods, which are supervised, at inducing invariance to inherent nuisance factors, effectively using synthetic data augmentation to learn invariance, and domain adaptation. Our method can be applied to any prediction task, eg., binary/multi-class classification or regression, without loss of generality.

artificial intelligence, information, neural network, (16 more...)

1809.10083

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceAug-20-2018

Deep Multimodal Image-Repurposing Detection

Sabir, Ekraam, AbdAlmageed, Wael, Wu, Yue, Natarajan, Prem

Nefarious actors on social media and other platforms often spread rumors and falsehoods through images whose metadata (e.g., captions) have been modified to provide visual substantiation of the rumor/falsehood. This type of modification is referred to as image repurposing, in which often an unmanipulated image is published along with incorrect or manipulated metadata to serve the actor's ulterior motives. We present the Multimodal Entity Image Repurposing (MEIR) dataset, a substantially challenging dataset over that which has been previously available to support research into image repurposing detection. The new dataset includes location, person, and organization manipulations on real-world data sourced from Flickr. We also present a novel, end-to-end, deep multimodal learning model for assessing the integrity of an image by combining information extracted from the image with related information from a knowledge base. The proposed method is compared against state-of-the-art techniques on existing datasets as well as MEIR, where it outperforms existing methods across the board, with AUC improvement up to 0.23.

dataset, deep learning, neural network, (26 more...)

doi: 10.1145/3240508.3240707

1808.06686

Country:

Europe > United Kingdom > England (0.47)
North America > Canada > New Brunswick > Saint John County > Saint John (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > California > Riverside County (0.14)

Genre: Research Report (0.70)

Industry:

Media > News (0.47)
Government > Regional Government (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJun-28-2018

Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text

Jaiswal, Ayush, Sabir, Ekraam, AbdAlmageed, Wael, Natarajan, Premkumar

Real world multimedia data is often composed of multiple modalities such as an image or a video with associated text (e.g. captions, user comments, etc.) and metadata. Such multimodal data packages are prone to manipulations, where a subset of these modalities can be altered to misrepresent or repurpose data packages, with possible malicious intent. It is, therefore, important to develop methods to assess or verify the integrity of these multimedia packages. Using computer vision and natural language processing methods to directly compare the image (or video) and the associated caption to verify the integrity of a media package is only possible for a limited set of objects and scenes. In this paper, we present a novel deep learning-based approach for assessing the semantic integrity of multimedia packages containing images and captions, using a reference set of multimedia packages. We construct a joint embedding of images and captions with deep multimodal representation learning on the reference dataset in a framework that also provides image-caption consistency scores (ICCSs). The integrity of query media packages is assessed as the inlierness of the query ICCSs with respect to the reference dataset. We present the MultimodAl Information Manipulation dataset (MAIM), a new dataset of media packages from Flickr, which we make available to the research community. We use both the newly created dataset as well as Flickr30K and MS COCO datasets to quantitatively evaluate our proposed approach. The reference dataset does not contain unmanipulated versions of tampered query packages. Our method is able to achieve F1 scores of 0.75, 0.89 and 0.94 on MAIM, Flickr30K and MS COCO, respectively, for detecting semantically incoherent media packages.

caption, deep learning, neural network, (20 more...)

doi: 10.1145/3123266.3123385

1707.01606

Country: North America > United States > Nevada (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-6-2018

Adversarial Auto-encoders for Speech Based Emotion Recognition

Sahu, Saurabh, Gupta, Rahul, Sivaraman, Ganesh, AbdAlmageed, Wael, Espy-Wilson, Carol

Recently, generative adversarial networks and adversarial autoencoders have gained a lot of attention in machine learning community due to their exceptional performance in tasks such as digit classification and face recognition. They map the autoencoder's bottleneck layer output (termed as code vectors) to different noise Probability Distribution Functions (PDFs), that can be further regularized to cluster based on class information. In addition, they also allow a generation of synthetic samples by sampling the code vectors from the mapped PDFs. Inspired by these properties, we investigate the application of adversarial autoencoders to the domain of emotion recognition. Specifically, we conduct experiments on the following two aspects: (i) their ability to encode high dimensional feature vector representations for emotional utterances into a compressed space (with a minimal loss of emotion class discriminability in the compressed space), and (ii) their ability to regenerate synthetic samples in the original feature space, to be later used for purposes such as training emotion recognition classifiers. We demonstrate the promise of adversarial autoencoders with regards to these aspects on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus and present our analysis.

deep learning, emotion recognition, neural network, (21 more...)

1806.02146

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.35)

arXiv.org Machine LearningMar-14-2018

Bidirectional Conditional Generative Adversarial Networks

Jaiswal, Ayush, AbdAlmageed, Wael, Wu, Yue, Natarajan, Premkumar

Conditional Generative Adversarial Networks (cGANs) are generative models that can produce data samples ($x$) conditioned on both latent variables ($z$) and known auxiliary information ($c$). We propose the Bidirectional cGAN (BiCoGAN), which effectively disentangles $z$ and $c$ in the generation process and provides an encoder that learns inverse mappings from $x$ to both $z$ and $c$, trained jointly with the generator and the discriminator. We present crucial techniques for training BiCoGANs, which involve an extrinsic factor loss along with an associated dynamically-tuned importance weight. As compared to other encoder-based cGANs, BiCoGANs encode $c$ more accurately, and utilize $z$ and $c$ more effectively and in a more disentangled way to generate samples.

bicogan, deep learning, neural network, (15 more...)

1711.07461

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningMar-12-2018

CapsuleGAN: Generative Adversarial Capsule Network

Jaiswal, Ayush, AbdAlmageed, Wael, Wu, Yue, Natarajan, Premkumar

We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutional-GAN at modeling image data distribution on MNIST and CIFAR-10 datasets, evaluated on the generative adversarial metric and at semi-supervised image classification.

capsulegan, deep learning, neural network, (17 more...)

1802.06167

Country: Oceania > Australia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)