Gupta, Akshat
AIR-JPMC@SMM4H'22: Classifying Self-Reported Intimate Partner Violence in Tweets with Multiple BERT-based Models
Candidato, Alec, Gupta, Akshat, Liu, Xiaomo, Shah, Sameena
This paper presents our submission for the SMM4H 2022-Shared Task on the classification of self-reported intimate partner violence on Twitter (in English). The goal of this task was to accurately determine if the contents of a given tweet demonstrated someone reporting their own experience with intimate partner violence. The submitted system is an ensemble of five RoBERTa models each weighted by their respective F1-scores on the validation data-set. This system performed 13% better than the baseline and was the best performing system overall for this shared task.
Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages
Gupta, Akshat, Li, Xinjian, Rallabandi, Sai Krishna, Black, Alan W
With recent advancements in language technologies, humansare now interacting with technology through speech. To in-crease the reach of these technologies, we need to build suchsystems in local languages. A major bottleneck here are theunderlying data-intensive parts that make up such systems,including automatic speech recognition (ASR) systems thatrequire large amounts of labelled data. With the aim of aidingdevelopment of dialog systems in low resourced languages,we propose a novel acoustics based intent recognition systemthat uses discovered phonetic units for intent classification.The system is made up of two blocks - the first block gen-erates a transcript of discovered phonetic units for the inputaudio, and the second block which performs intent classifi-cation from the generated phonemic transcripts. Our workpresents results for such a system for two languages families- Indic languages and Romance languages, for two differentintent recognition tasks. We also perform multilingual train-ing of our intent classifier and show improved cross-lingualtransfer and performance on an unknown language with zeroresources in the same language family.
Blind Descent: A Prequel to Gradient Descent
Gupta, Akshat, R, Prasad N
We describe an alternative learning method for neural networks, which we call Blind Descent. By design, Blind Descent does not face problems like exploding or vanishing gradients. In Blind Descent, gradients are not used to guide the learning process. In this paper, we present Blind Descent as a more fundamental learning process compared to gradient descent. We also show that gradient descent can be seen as a specific case of the Blind Descent algorithm. We also train two neural network architectures, a multilayer perceptron and a convolutional neural network, using the most general Blind Descent algorithm to demonstrate a proof of concept.
Spatiotemporal Action Recognition in Restaurant Videos
Gupta, Akshat, Desai, Milan, Liang, Wusheng, Kannan, Magesh
Spatiotemporal action recognition is the task of locating and classifying actions in videos. Our project applies this task to analyzing video footage of restaurant workers preparing food, for which potential applications include automated checkout and inventory management. Such videos are quite different from the standardized datasets that researchers are used to, as they involve small objects, rapid actions, and notoriously unbalanced data classes. We explore two approaches - one involving the familiar object detector "You Only Look Once" (YOLO), and another applying a recently proposed analogue for action recognition, "You Only Watch Once" (YOWO). In the first, we design and implement a novel, recurrent modification of YOLO using convolutional LSTMs and explore the various subtleties in the training of such a network. In the second, we study the ability of YOWO's three-dimensional convolutions to capture the spatiotemporal features of our unique dataset, which was generously lent by CMU-based startup Agot.