Goto

Collaborating Authors

 IBM Research AI


AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

AAAI Conferences

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression techniques are needed that are computationally friendly, applicable to a wide variety of layers seen in Deep Neural Networks and adaptable to variations in network architectures as well as their hyper-parameters. In this paper we introduce a novel technique - the Adaptive Residual Gradient Compression ( AdaComp ) scheme. AdaComp is based on localized selection of gradient residues and automatically tunes the compression rate depending on local activity. We show excellent results on a wide spectrum of state of the art Deep Learning models in multiple domains (vision, speech, language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare), optimizers (SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.). Exploiting both sparsity and quantization, we demonstrate end-to-end compression rates of ∼ 200 × for fully-connected and recurrent layers, and ∼ 40 × for convolutional layers, without any noticeable degradation in model accuracies.


Cognition-Cognizant Sentiment Analysis With Multitask Subjectivity Summarization Based on Annotators' Gaze Behavior

AAAI Conferences

For document level sentiment analysis (SA), Subjectivity Extraction, ie., extracting the relevant subjective portions of the text that cover the overall sentiment expressed in the document, is an important step. Subjectivity Extraction, however, is a hard problem for systems, as it demands a great deal of world knowledge and reasoning. Humans, on the other hand, are good at extracting relevant subjective summaries from an opinionated document (say, a movie review), while inferring the sentiment expressed in it. This capability is manifested in their eye-movement behavior while reading: words pertaining to the subjective summary of the text attract a lot more attention in the form of gaze-fixations and/or saccadic patterns. We propose a multi-task deep neural framework for document level sentiment analysis that learns to predict the overall sentiment expressed in the given input document, by simultaneously learning to predict human gaze behavior and auxiliary linguistic tasks like part-of-speech and syntactic properties of words in the document. For this, a multi-task learning algorithm based on multi-layer shared LSTM augmented with task specific classifiers is proposed. With this composite multi-task network, we obtain performance competitive with or better than state-of-the-art approaches in SA. Moreover, the availability of gaze predictions as an auxiliary output helps interpret the system better; for instance, gaze predictions reveal that the system indeed performs subjectivity extraction better, which accounts for improvement in document level sentiment analysis performance.