Goyal, Palash
Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models
Mehrabi, Ninareh, Goyal, Palash, Verma, Apurv, Dhamala, Jwala, Kumar, Varun, Hu, Qian, Chang, Kai-Wei, Zemel, Richard, Galstyan, Aram, Gupta, Rahul
Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benchmark dataset covering different types of ambiguities that occur in these systems. We then propose a framework to mitigate ambiguities in the prompts given to the systems by soliciting clarifications from the user. Through automatic and human evaluations, we show the effectiveness of our framework in generating more faithful images aligned with human intention in the presence of ambiguities.
Hierarchical Class-Based Curriculum Loss
Goyal, Palash, Ghosh, Shalini
Classification algorithms in machine learning often assume a flat label space. However, most real world data have dependencies between the labels, which can often be captured by using a hierarchy. Utilizing this relation can help develop a model capable of satisfying the dependencies and improving model accuracy and interpretability. Further, as different levels in the hierarchy correspond to different granularities, penalizing each label equally can be detrimental to model learning. In this paper, we propose a loss function, hierarchical curriculum loss, with two properties: (i) satisfy hierarchical constraints present in the label space, and (ii) provide non-uniform weights to labels based on their levels in the hierarchy, learned implicitly by the training paradigm. We theoretically show that the proposed loss function is a tighter bound of 0-1 loss compared to any other loss satisfying the hierarchical constraints. We test our loss function on real world image data sets, and show that it significantly substantially outperforms multiple baselines.
Exploiting Temporal Coherence for Multi-modal Video Categorization
Goyal, Palash, Sahu, Saurabh, Ghosh, Shalini, Lee, Chul
Multimodal ML models can process data in multiple modalities (e.g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding). In this paper, we focus on the problem of video categorization by using a multimodal approach. We have developed a novel temporal coherence-based regularization approach, which applies to different types of models (e.g., RNN, NetVLAD, Transformer). We demonstrate through experiments how our proposed multimodal video categorization models with temporal coherence out-perform strong state-of-the-art baseline models.
Graph Representation Ensemble Learning
Goyal, Palash, Huang, Di, Chhetri, Sujit Rokka, Canedo, Arquimedes, Shree, Jaya, Patterson, Evan
Representation learning on graphs has been gaining attention due to its wide applicability in predicting missing links, and classifying and recommending nodes. Most embedding methods aim to preserve certain properties of the original graph in the low dimensional space. However, real world graphs have a combination of several properties which are difficult to characterize and capture by a single approach. In this work, we introduce the problem of graph representation ensemble learning and provide a first of its kind framework to aggregate multiple graph embedding methods efficiently. We provide analysis of our framework and analyze -- theoretically and empirically -- the dependence between state-of-the-art embedding methods. We test our models on the node classification task on four real world graphs and show that proposed ensemble approaches can outperform the state-of-the-art methods by up to 8% on macro-F1. We further show that the approach is even more beneficial for underrepresented classes providing an improvement of up to 12%.
ArduCode: Predictive Framework for Automation Engineering
Canedo, Arquimedes, Goyal, Palash, Huang, Di, Pandey, Amit
Automation engineering is the task of integrating, via software, various sensors, actuators, and controls for automating a real-world process. Today, automation engineering is supported by a suite of software tools including integrated development environments (IDE), hardware configurators, compilers, and runtimes. These tools focus on the automation code itself, but leave the automation engineer unassisted in their decision making. This can lead to increased time for software development because of imperfections in decision making leading to multiple iterations between software and hardware. To address this, this paper defines multiple challenges often faced in automation engineering and propose solutions using machine learning to assist engineers tackle such challenges. We show that machine learning can be leveraged to assist the automation engineer in classifying automation, finding similar code snippets, and reasoning about the hardware selection of sensors and actuators. We validate our architecture on two real datasets consisting of 2,927 Arduino projects, and 683 Programmable Logic Controller (PLC) projects. Our results show that paragraph embedding techniques can be utilized to classify automation using code snippets with precision close to human annotation, giving an F1-score of 72%. Further, we show that such embedding techniques can help us find similar code snippets with high accuracy. Finally, we use autoencoder models for hardware recommendation and achieve a p@3 of 0.79 and p@5 of 0.95.
Pykg2vec: A Python Library for Knowledge Graph Embedding
Yu, Shih Yuan, Chhetri, Sujit Rokka, Canedo, Arquimedes, Goyal, Palash, Faruque, Mohammad Abdullah Al
Pykg2vec is an open-source Python library for learning the representations of the entities and relations in knowledge graphs. Pykg2vec's flexible and modular software architecture currently implements 16 state-of-the-art knowledge graph embedding algorithms, and is designed to easily incorporate new algorithms. The goal of pykg2vec is to provide a practical and educational platform to accelerate research in knowledge graph representation learning. Pykg2vec is built on top of TensorFlow and Python's multiprocessing framework and provides modules for batch generation, Bayesian hyperparameter optimization, mean rank evaluation, embedding, and result visualization. Pykg2vec is released under the MIT License and is also available in the Python Package Index (PyPI).
DynamicGEM: A Library for Dynamic Graph Embedding Methods
Goyal, Palash, Chhetri, Sujit Rokka, Mehrabi, Ninareh, Ferrara, Emilio, Canedo, Arquimedes
DynamicGEM is an open-source Python library for learning node representations of dynamic graphs. It consists of state-of-the-art algorithms for defining embeddings of nodes whose connections evolve over time. The library also contains the evaluation framework for four downstream tasks on the network: graph reconstruction, static and temporal link prediction, node classification, and temporal visualization. We have implemented various metrics to evaluate the state-of-the-art methods, and examples of evolving networks from various domains. We have easy-to-use functions to call and evaluate the methods and have extensive usage documentation. Furthermore, DynamicGEM provides a template to add new algorithms with ease to facilitate further research on the topic.
dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning
Goyal, Palash, Chhetri, Sujit Rokka, Canedo, Arquimedes
Understanding and analyzing graphs is an essential topic that has been widely studied over the past decades. Many real world problems can be formulated as link predictions in graphs (Gehrke, Ginsparg, and Kleinberg 2003; Freeman 2000; Theocharidis et al. 2009; Goyal, Sapienza, and Ferrara 2018). For example, link prediction in an author collaboration network (Gehrke, Ginsparg, and Kleinberg 2003) can be used to predict potential future author collaboration. Similarly, new connections between proteins can be discovered using protein interaction networks (Pavlopoulos, Wegener, and Schneider 2008), and new friendships can be predicted using social networks (Wasserman and Faust 1994). Recent work on obtaining such predictions use graph representation learning. These methods represent each node in the network with a fixed dimensional embedding, and map link prediction in the network space to a nearest neighbor search in the embedding space (Goyal and Ferrara 2018). It has been shown that such techniques can outperform traditional link prediction methods on graphs (Grover and Leskovec 2016; Ou et al. 2016a).
Future Automation Engineering using Structural Graph Convolutional Neural Networks
Wan, Jiang, Pollard, Blake S., Chhetri, Sujit Rokka, Goyal, Palash, Faruque, Mohammad Abdullah Al, Canedo, Arquimedes
The digitalization of automation engineering generates large quantities of engineering data that is interlinked in knowledge graphs. Classifying and clustering subgraphs according to their functionality is useful to discover functionally equivalent engineering artifacts that exhibit different graph structures. This paper presents a new graph learning algorithm designed to classify engineering data artifacts -- represented in the form of graphs -- according to their structure and neighborhood features. Our Structural Graph Convolutional Neural Network (SGCNN) is capable of learning graphs and subgraphs with a novel graph invariant convolution kernel and downsampling/pooling algorithm. On a realistic engineering-related dataset, we show that SGCNN is capable of achieving ~91% classification accuracy.
Modeling Evolution of Topics in Large-Scale Temporal Text Corpora
Momeni, Elaheh (University of Vienna) | Karunasekera, Shanika (University of Melbourne) | Goyal, Palash (University of Southern California) | Lerman, Kristina (University of Southern California)
Large text temporal collections provide insights into social and cultural change over time. To quantify changes in topics in these corpora, embedding methods have been used as a diachronic tool. However, they have limited utility for modeling changes in topics due to the stochastic nature of training. We propose a new computational approach for tracking and detecting temporal evolution of topics in a large collection of texts. This approach for identifying dynamic topics and modeling their evolution combines the advantages of two methods: (1) word embeddings to learn contextual semantic representation of words from temporal snapshots of the data and (2) dynamic network analysis to identify dynamic topics by using dynamic semantic similarity networks developed using embedding models. Experimenting with two large temporal data sets from the legal and real estate domains, we show that this approach performs faster (due to parallelizing different snapshots), uncovers more coherent topics (compared to available dynamic topic modeling approaches), and effectively enables modeling evolution leveraging the network structure.