AITopics | Jaiswal, Ayush

FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Pal, Anwesan, Wadhwa, Sahil, Jaiswal, Ayush, Zhang, Xu, Wu, Yue, Chada, Rakesh, Natarajan, Pradeep, Christensen, Henrik I.

arXiv.org Artificial IntelligenceAug-20-2023

Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in this work. Further analysis of the model in a real-world interactive setting demonstrates two important capabilities of our model -- memory retention across turns, and agnosticity to turn order for non-contradictory feedback. Finally, user study results show that images retrieved by FashionNTM were favored by 83.1% over other multi-turn models. Project page: https://sites.google.com/eng.ucsd.edu/fashionntm

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.1017

Genre: Research Report > New Finding (0.54)

Industry:

Health & Medicine (0.46)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Class-agnostic Object Detection

Jaiswal, Ayush, Wu, Yue, Natarajan, Pradeep, Natarajan, Premkumar

arXiv.org Machine LearningNov-28-2020

Object detection models perform well at localizing and classifying objects that they are shown during training. However, due to the difficulty and cost associated with creating and annotating detection datasets, trained models detect a limited number of object types with unknown objects treated as background content. This hinders the adoption of conventional detectors in real-world applications like large-scale object matching, visual grounding, visual relation prediction, obstacle detection (where it is more important to determine the presence and location of objects than to find specific types), etc. We propose class-agnostic object detection as a new problem that focuses on detecting objects irrespective of their object-classes. Specifically, the goal is to predict bounding boxes for all objects in an image but not their object-classes. The predicted boxes can then be consumed by another system to perform application-specific classification, retrieval, etc. We propose training and evaluation protocols for benchmarking class-agnostic detectors to advance future research in this domain. Finally, we propose (1) baseline methods and (2) a new adversarial learning framework for class-agnostic detection that forces the model to exclude class-specific information from features used for predictions. Experimental results show that adversarial learning improves class-agnostic detection efficacy.

artificial intelligence, neural network, object detection, (20 more...)

arXiv.org Machine Learning

2011.14204

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

MEG: Multi-Evidence GNN for Multimodal Semantic Forensics

Sabir, Ekraam, Jaiswal, Ayush, AbdAlmageed, Wael, Natarajan, Prem

arXiv.org Artificial IntelligenceNov-23-2020

Fake news often involves semantic manipulations across modalities such as image, text, location etc and requires the development of multimodal semantic forensics for its detection. Recent research has centered the problem around images, calling it image repurposing -- where a digitally unmanipulated image is semantically misrepresented by means of its accompanying multimodal metadata such as captions, location, etc. The image and metadata together comprise a multimedia package. The problem setup requires algorithms to perform multimodal semantic forensics to authenticate a query multimedia package using a reference dataset of potentially related packages as evidences. Existing methods are limited to using a single evidence (retrieved package), which ignores potential performance improvement from the use of multiple evidences. In this work, we introduce a novel graph neural network based model for multimodal semantic forensics, which effectively utilizes multiple retrieved packages as evidences and is scalable with the number of evidences. We compare the scalability and performance of our model against existing methods. Experimental results show that the proposed model outperforms existing state-of-the-art algorithms with an error reduction of up to 25%.

dataset, deep learning, neural network, (21 more...)

arXiv.org Artificial Intelligence

2011.11286

Country: North America > United States (0.29)

Genre: Research Report (0.70)

Industry: Media > News (0.90)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.95)

Add feedback

Unified Adversarial Invariance

Jaiswal, Ayush, Wu, Yue, AbdAlmageed, Wael, Natarajan, Premkumar

arXiv.org Machine LearningMay-7-2019

We present a unified invariance framework for supervised neural networks that can induce independence to nuisance factors of data without using any nuisance annotations, but can additionally use labeled information about biasing factors to force their removal from the latent embedding for making fair predictions. Invariance to nuisance is achieved by learning a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, whereas that to biasing factors is brought about by penalizing the network if the latent embedding contains any information about them. We describe an adversarial instantiation of this framework and provide analysis of its working. Our model outperforms previous works at inducing invariance to nuisance factors without using any labeled information about such variables, and achieves state-of-the-art performance at learning independence to biasing factors in fairness settings.

information, neural network, us government, (17 more...)

arXiv.org Machine Learning

1905.03629

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Unsupervised Adversarial Invariance

Jaiswal, Ayush, Wu, Rex Yue, Abd-Almageed, Wael, Natarajan, Prem

Neural Information Processing SystemsDec-31-2018

Data representations that contain all the information about target variables but are invariant to nuisance factors benefit supervised learning algorithms by preventing them from learning associations between these factors and the targets, thus reducing overfitting. We present a novel unsupervised invariance induction framework for neural networks that learns a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, without needing any labeled information about nuisance factors or domain knowledge. We describe an adversarial instantiation of this framework and provide analysis of its working. Our unsupervised model outperforms state-of-the-art methods, which are supervised, at inducing invariance to inherent nuisance factors, effectively using synthetic data augmentation to learn invariance, and domain adaptation. Our method can be applied to any prediction task, eg., binary/multi-class classification or regression, without loss of generality.

artificial intelligence, information, neural network, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Unsupervised Adversarial Invariance

Jaiswal, Ayush, Wu, Rex Yue, Abd-Almageed, Wael, Natarajan, Prem

Neural Information Processing SystemsDec-31-2018

Data representations that contain all the information about target variables but are invariant to nuisance factors benefit supervised learning algorithms by preventing them from learning associations between these factors and the targets, thus reducing overfitting. We present a novel unsupervised invariance induction framework for neural networks that learns a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, without needing any labeled information about nuisance factors or domain knowledge. We describe an adversarial instantiation of this framework and provide analysis of its working. Our unsupervised model outperforms state-of-the-art methods, which are supervised, at inducing invariance to inherent nuisance factors, effectively using synthetic data augmentation to learn invariance, and domain adaptation. Our method can be applied to any prediction task, eg., binary/multi-class classification or regression, without loss of generality.

artificial intelligence, information, neural network, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Unsupervised Adversarial Invariance

Jaiswal, Ayush, Wu, Yue, AbdAlmageed, Wael, Natarajan, Premkumar

arXiv.org Machine LearningSep-26-2018

Data representations that contain all the information about target variables but are invariant to nuisance factors benefit supervised learning algorithms by preventing them from learning associations between these factors and the targets, thus reducing overfitting. We present a novel unsupervised invariance induction framework for neural networks that learns a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, without needing any labeled information about nuisance factors or domain knowledge. We describe an adversarial instantiation of this framework and provide analysis of its working. Our unsupervised model outperforms state-of-the-art methods, which are supervised, at inducing invariance to inherent nuisance factors, effectively using synthetic data augmentation to learn invariance, and domain adaptation. Our method can be applied to any prediction task, eg., binary/multi-class classification or regression, without loss of generality.

artificial intelligence, information, neural network, (16 more...)

arXiv.org Machine Learning

1809.10083

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text

Jaiswal, Ayush, Sabir, Ekraam, AbdAlmageed, Wael, Natarajan, Premkumar

arXiv.org Artificial IntelligenceJun-28-2018

Real world multimedia data is often composed of multiple modalities such as an image or a video with associated text (e.g. captions, user comments, etc.) and metadata. Such multimodal data packages are prone to manipulations, where a subset of these modalities can be altered to misrepresent or repurpose data packages, with possible malicious intent. It is, therefore, important to develop methods to assess or verify the integrity of these multimedia packages. Using computer vision and natural language processing methods to directly compare the image (or video) and the associated caption to verify the integrity of a media package is only possible for a limited set of objects and scenes. In this paper, we present a novel deep learning-based approach for assessing the semantic integrity of multimedia packages containing images and captions, using a reference set of multimedia packages. We construct a joint embedding of images and captions with deep multimodal representation learning on the reference dataset in a framework that also provides image-caption consistency scores (ICCSs). The integrity of query media packages is assessed as the inlierness of the query ICCSs with respect to the reference dataset. We present the MultimodAl Information Manipulation dataset (MAIM), a new dataset of media packages from Flickr, which we make available to the research community. We use both the newly created dataset as well as Flickr30K and MS COCO datasets to quantitatively evaluate our proposed approach. The reference dataset does not contain unmanipulated versions of tampered query packages. Our method is able to achieve F1 scores of 0.75, 0.89 and 0.94 on MAIM, Flickr30K and MS COCO, respectively, for detecting semantically incoherent media packages.

caption, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3123266.3123385

1707.01606

Country: North America > United States (0.69)

Genre: Research Report (0.50)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large-Scale Unsupervised Deep Representation Learning for Brain Structure

Jaiswal, Ayush, Guo, Dong, Raghavendra, Cauligi S., Thompson, Paul

arXiv.org Machine LearningMay-2-2018

Machine Learning (ML) is increasingly being used for computer aided diagnosis of brain related disorders based on structural magnetic resonance imaging (MRI) data. Most of such work employs biologically and medically meaningful hand-crafted features calculated from different regions of the brain. The construction of such highly specialized features requires a considerable amount of time, manual oversight and careful quality control to ensure the absence of errors in the computational process. Recent advances in Deep Representation Learning have shown great promise in extracting highly non-linear and information-rich features from data. In this paper, we present a novel large-scale deep unsupervised approach to learn generic feature representations of structural brain MRI scans, which requires no specialized domain knowledge or manual intervention. Our method produces low-dimensional representations of brain structure, which can be used to reconstruct brain images with very low error and exhibit performance comparable to FreeSurfer features on various classification tasks.

deep learning, neural network, representation, (21 more...)

arXiv.org Machine Learning

1805.01049

Country: North America > United States > California (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Bidirectional Conditional Generative Adversarial Networks

Jaiswal, Ayush, AbdAlmageed, Wael, Wu, Yue, Natarajan, Premkumar

arXiv.org Machine LearningMar-14-2018

Conditional Generative Adversarial Networks (cGANs) are generative models that can produce data samples ($x$) conditioned on both latent variables ($z$) and known auxiliary information ($c$). We propose the Bidirectional cGAN (BiCoGAN), which effectively disentangles $z$ and $c$ in the generation process and provides an encoder that learns inverse mappings from $x$ to both $z$ and $c$, trained jointly with the generator and the discriminator. We present crucial techniques for training BiCoGANs, which involve an extrinsic factor loss along with an associated dynamically-tuned importance weight. As compared to other encoder-based cGANs, BiCoGANs encode $c$ more accurately, and utilize $z$ and $c$ more effectively and in a more disentangled way to generate samples.

bicogan, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

1711.07461

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

Jaiswal, Ayush

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Class-agnostic Object Detection

MEG: Multi-Evidence GNN for Multimodal Semantic Forensics

Unified Adversarial Invariance

Unsupervised Adversarial Invariance

Unsupervised Adversarial Invariance

Unsupervised Adversarial Invariance

Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text

Large-Scale Unsupervised Deep Representation Learning for Brain Structure

Bidirectional Conditional Generative Adversarial Networks