Goto

Collaborating Authors

Results


Semantic and sentiment analysis of selected Bhagavad Gita translations using BERT-based language framework

arXiv.org Artificial Intelligence

It is well known that translations of songs and poems not only breaks rhythm and rhyming patterns, but also results in loss of semantic information. The Bhagavad Gita is an ancient Hindu philosophical text originally written in Sanskrit that features a conversation between Lord Krishna and Arjuna prior to the Mahabharata war. The Bhagavad Gita is also one of the key sacred texts in Hinduism and known as the forefront of the Vedic corpus of Hinduism. In the last two centuries, there has been a lot of interest in Hindu philosophy by western scholars and hence the Bhagavad Gita has been translated in a number of languages. However, there is not much work that validates the quality of the English translations. Recent progress of language models powered by deep learning has enabled not only translations but better understanding of language and texts with semantic and sentiment analysis. Our work is motivated by the recent progress of language models powered by deep learning methods. In this paper, we compare selected translations (mostly from Sanskrit to English) of the Bhagavad Gita using semantic and sentiment analyses. We use hand-labelled sentiment dataset for tuning state-of-art deep learning-based language model known as \textit{bidirectional encoder representations from transformers} (BERT). We use novel sentence embedding models to provide semantic analysis for selected chapters and verses across translations. Finally, we use the aforementioned models for sentiment and semantic analyses and provide visualisation of results. Our results show that although the style and vocabulary in the respective Bhagavad Gita translations vary widely, the sentiment analysis and semantic similarity shows that the message conveyed are mostly similar across the translations.


Analyzing Scientific Publications using Domain-Specific Word Embedding and Topic Modelling

arXiv.org Artificial Intelligence

The scientific world is changing at a rapid pace, with new technology being developed and new trends being set at an increasing frequency. This paper presents a framework for conducting scientific analyses of academic publications, which is crucial to monitor research trends and identify potential innovations. This framework adopts and combines various techniques of Natural Language Processing, such as word embedding and topic modelling. Word embedding is used to capture semantic meanings of domain-specific words. We propose two novel scientific publication embedding, i.e., PUB-G and PUB-W, which are capable of learning semantic meanings of general as well as domain-specific words in various research fields. Thereafter, topic modelling is used to identify clusters of research topics within these larger research fields. We curated a publication dataset consisting of two conferences and two journals from 1995 to 2020 from two research domains. Experimental results show that our PUB-G and PUB-W embeddings are superior in comparison to other baseline embeddings by a margin of ~0.18-1.03 based on topic coherence.


Natural Language Processing in-and-for Design Research

arXiv.org Artificial Intelligence

We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.


Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

arXiv.org Artificial Intelligence

Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches. We also present approaches that use pre-trained language models to generate data for training augmentation or other purposes. We conclude with discussions on limitations and suggested directions for future research.


Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research

arXiv.org Artificial Intelligence

AI is widely thought to be poised to transform business, yet current perceptions of the scope of this transformation may be myopic. Recent progress in natural language processing involving transformer language models (TLMs) offers a potential avenue for AI-driven business and societal transformation that is beyond the scope of what most currently foresee. We review this recent progress as well as recent literature utilizing text mining in top IS journals to develop an outline for how future IS research can benefit from these new techniques. Our review of existing IS literature reveals that suboptimal text mining techniques are prevalent and that the more advanced TLMs could be applied to enhance and increase IS research involving text data, and to enable new IS research topics, thus creating more value for the research community. This is possible because these techniques make it easier to develop very powerful custom systems and their performance is superior to existing methods for a wide range of tasks and applications. Further, multilingual language models make possible higher quality text analytics for research in multiple languages. We also identify new avenues for IS research, like language user interfaces, that may offer even greater potential for future IS research.


Multi-label Emotion Classification with PyTorch + HuggingFace's Transformers and W&B for Tracking

#artificialintelligence

The GoEmotions dataset contains 58k carefully curated Reddit comments labeled for 27 emotion categories or Neutral. The raw data is included as well as the smaller, simplified version of the dataset with predefined train/val/test splits. After going through a few examples in this dataset on their visualizer, I realized that this is an extremely crucial dataset because it's rare to find sentiment classifier datasets that go beyond 5–6 emotions. But here, we have 27 emotions being assigned, with rare and close enough emotions like disappointment, disapproval, grief, remorse, sadness, etc. Detecting such close enough emotions is often difficult in typical datasets. This made it clear to me that this is an excellent dataset that can be scaled for usage in many applications that involve text analysis.


Transformer-Encoder-GRU (T-E-GRU) for Chinese Sentiment Analysis on Chinese Comment Text

arXiv.org Artificial Intelligence

Chinese sentiment analysis (CSA) has always been one of the challenges in natural language processing due to its complexity and uncertainty. Transformer has succeeded in capturing semantic features, but it uses position encoding to capture sequence features, which has great shortcomings compared with the recurrent model. In this paper, we propose T-E-GRU for Chinese sentiment analysis, which combine transformer encoder and GRU. We conducted experiments on three Chinese comment datasets. In view of the confusion of punctuation marks in Chinese comment texts, we selectively retain some punctuation marks with sentence segmentation ability. The experimental results show that T-E-GRU outperforms classic recurrent model and recurrent model with attention.


Identifying negativity factors from social media text corpus using sentiment analysis method

arXiv.org Artificial Intelligence

Automatic sentiment analysis play vital role in decision making. Many organizations spend a lot of budget to understand their customer satisfaction by manually going over their feedback/comments or tweets. Automatic sentiment analysis can give overall picture of the comments received against any event, product, or activity. Usually, the comments/tweets are classified into two main classes that are negative or positive. However, the negative comments are too abstract to understand the basic reason or the context. organizations are interested to identify the exact reason for the negativity. In this research study, we hierarchically goes down into negative comments, and link them with more classes. Tweets are extracted from social media sites such as Twitter and Facebook. If the sentiment analysis classifies any tweet into negative class, then we further try to associates that negative comments with more possible negative classes. Based on expert opinions, the negative comments/tweets are further classified into 8 classes. Different machine learning algorithms are evaluated and their accuracy are reported.



SubjQA: A Dataset for Subjectivity and Review Comprehension

arXiv.org Artificial Intelligence

Subjectivity is the expression of internal opinions or beliefs which cannot be objectively observed or verified, and has been shown to be important for sentiment analysis and word-sense disambiguation. Furthermore, subjectivity is an important aspect of user-generated data. In spite of this, subjectivity has not been investigated in contexts where such data is widespread, such as in question answering (QA). We therefore investigate the relationship between subjectivity and QA, while developing a new dataset. We compare and contrast with analyses from previous work, and verify that findings regarding subjectivity still hold when using recently developed NLP architectures. We find that subjectivity is also an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance. For instance, a subjective question may or may not be associated with a subjective answer. We release an English QA dataset (SubjQA) based on customer reviews, containing subjectivity annotations for questions and answer spans across 6 distinct domains.