Goto

Collaborating Authors

 Information Extraction


Natural language processing for word sense disambiguation and information extraction

arXiv.org Artificial Intelligence

This research work deals with Natural Language Processing (NLP) and extraction of essential information in an explicit form. The most common among the information management strategies is Document Retrieval (DR) and Information Filtering. DR systems may work as combine harvesters, which bring back useful material from the vast fields of raw material. With large amount of potentially useful information in hand, an Information Extraction (IE) system can then transform the raw material by refining and reducing it to a germ of original text. A Document Retrieval system collects the relevant documents carrying the required information, from the repository of texts. An IE system then transforms them into information that is more readily digested and analyzed. It isolates relevant text fragments, extracts relevant information from the fragments, and then arranges together the targeted information in a coherent framework. The thesis presents a new approach for Word Sense Disambiguation using thesaurus. The illustrative examples supports the effectiveness of this approach for speedy and effective disambiguation. A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated. A question-answering system describes the operation of information extraction from the retrieved text documents. The process of information extraction for answering a query is considerably simplified by using a Structured Description Language (SDL) which is based on cardinals of queries in the form of who, what, when, where and why. The thesis concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning, for document retrieval and information extraction. This strategy permits relaxation of many limitations, which are inherent in Bayesian probabilistic approach.



What AI-based Sentiment Analysis Can Tell Us About Fintech and Neobanks

#artificialintelligence

Over the past decade, fintech firms have set out to reinvent banking and financial services. One major market trend is the growth of the neobank, a new type of bank that is 100% digital. Instead of using physical branch networks, neobanks service customers using software and applications, allowing customers to transact on their mobile devices and providing accounts with much lower fees and more features. This trend to digitizing banking and the exchange of value is a natural progression of the information revolution to embrace digital. Fintech is an exciting market that continues to grow.


Sentiment Analysis in Drug Reviews using Supervised Machine Learning Algorithms

arXiv.org Machine Learning

Sentiment Analysis is an important algorithm in Natural Language Processing which is used to detect sentiment within some text. In our project, we had chosen to work on analyzing reviews of various drugs which have been reviewed in form of texts and have also been given a rating on a scale from 1-10. We had obtained this data set from the UCI machine learning repository which had 2 data sets: train and test (split as 75-25\%). We had split the number rating for the drug into three classes in general: positive (7-10), negative (1-4) or neutral(4-7). There are multiple reviews for the drugs that belong to a similar condition and we decided to investigate how the reviews for different conditions use different words impact the ratings of the drugs. Our intention was mainly to implement supervised machine learning classification algorithms that predict the class of the rating using the textual review. We had primarily implemented different embeddings such as Term Frequency Inverse Document Frequency (TFIDF) and the Count Vectors (CV). We had trained models on the most popular conditions such as "Birth Control", "Depression" and "Pain" within the data set and obtained good results while predicting the test data sets.


Compositional De-Attention Networks

Neural Information Processing Systems

Attentional models are distinctly characterized by their ability to learn relative importance, i.e., assigning a different weight to input values. This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to \textit{add}, \textit{subtract} or \textit{nullify} a certain vector when learning representations. This is strongly contrasted with vanilla attention, which simply re-weights input tokens. Our proposed \textit{Compositional De-Attention} (CoDA) is fundamentally built upon the intuition of both similarity and dissimilarity (negative affinity) when computing affinity scores, benefiting from a greater extent of expressiveness. We evaluate CoDA on six NLP tasks, i.e. open domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis and text2code generation.


LAXARY: A Trustworthy Explainable Twitter Analysis Model for Post-Traumatic Stress Disorder Assessment

arXiv.org Artificial Intelligence

Veteran mental health is a significant national problem as large number of veterans are returning from the recent war in Iraq and continued military presence in Afghanistan. While significant existing works have investigated twitter posts-based Post Traumatic Stress Disorder (PTSD) assessment using blackbox machine learning techniques, these frameworks cannot be trusted by the clinicians due to the lack of clinical explainability. To obtain the trust of clinicians, we explore the big question, can twitter posts provide enough information to fill up clinical PTSD assessment surveys that have been traditionally trusted by clinicians? To answer the above question, we propose, LAXARY (Linguistic Analysis-based Exaplainable Inquiry) model, a novel Explainable Artificial Intelligent (XAI) model to detect and represent PTSD assessment of twitter users using a modified Linguistic Inquiry and Word Count (LIWC) analysis. First, we employ clinically validated survey tools for collecting clinical PTSD assessment data from real twitter users and develop a PTSD Linguistic Dictionary using the PTSD assessment survey results. Then, we use the PTSD Linguistic Dictionary along with machine learning model to fill up the survey tools towards detecting PTSD status and its intensity of corresponding twitter users. Our experimental evaluation on 210 clinically validated veteran twitter users provides promising accuracies of both PTSD classification and its intensity estimation. We also evaluate our developed PTSD Linguistic Dictionary's reliability and validity.


Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining

arXiv.org Artificial Intelligence

Aspect-based opinion mining is the task of identifying sentiment at the aspect level in opinionated text, which consists of two subtasks: aspect category extraction and sentiment polarity classification. While aspect category extraction aims to detect and categorize opinion targets such as product features, sentiment polarity classification assigns a sentiment label, i.e. positive, negative, or neutral, to each identified aspect. Supervised learning methods have been shown to deliver better accuracy for this task but they require labeled data, which is costly to obtain, especially for resource-poor languages like Vietnamese. To address this problem, we present a supervised aspect-based opinion mining method that utilizes labeled data from a foreign language (English in this case), which is translated to Vietnamese by an automated translation tool (Google Translate). Because aspects and opinions in different languages may be expressed by different words, we propose using word embeddings, in addition to other features, to reduce the vocabulary difference between the original and translated texts, thus improving the effectiveness of aspect category extraction and sentiment polarity classification processes. We also introduce an annotated corpus of aspect categories and sentiment polarities extracted from restaurant reviews in Vietnamese, and conduct a series of experiments on the corpus. Experimental results demonstrate the effectiveness of the proposed approach.


IBM advances Watson's ability to understand the language of business - CRN - India

#artificialintelligence

IBM is announcing several new IBM Watson technologies designed to help organizations begin identifying, understanding and analyzing some of the most challenging aspects of the English language with greater clarity, for greater insights. The new technologies represent the first commercialization of key Natural Language Processing (NLP) capabilities to come from IBM Research's Project Debater, the only AI system capable of debating humans on complex topics. For example, a new advanced sentiment analysis feature is defined to identify and analyze idioms and colloquialisms for the first time. Phrases, like'hardly helpful,' or'hot under the collar,' have been challenging for AI systems because they are difficult for algorithms to spot. With advanced sentiment analysis, businesses can begin analyzing such language data with Watson APIs for a more holistic understanding of their operation. Further, IBM is bringing technology from IBM Research for understanding business documents, such as PDF's and contracts, to also add to their AI models.


IBM's Watson Advances, Able To Understand The Language Of Business - Express Computer

#artificialintelligence

IBM is announcing several new IBM Watson technologies designed to help organizations begin identifying, understanding and analyzing some of the most challenging aspects of the English language with greater clarity, for greater insights. The new technologies represent the first commercialization of key Natural Language Processing (NLP) capabilities to come from IBM Research's Project Debater, the only AI system capable of debating humans on complex topics. For example, a new advanced sentiment analysis feature is defined to identify and analyze idioms and colloquialisms for the first time. Phrases, like'hardly helpful,' or'hot under the collar,' have been challenging for AI systems because they are difficult for algorithms to spot. With advanced sentiment analysis, businesses can begin analyzing such language data with Watson APIs for a more holistic understanding of their operation.


Sentiment Analysis with Contextual Embeddings and Self-Attention

arXiv.org Artificial Intelligence

In natural language the intended meaning of a word or phrase is often implicit and depends on the context. In this work, we propose a simple yet effective method for sentiment analysis using contextual embeddings and a self-attention mechanism. The experimental results for three languages, including morphologically rich Polish and German, show that our model is comparable to or even outperforms state-of-the-art models. In all cases the superiority of models leveraging contextual embeddings is demonstrated. Finally, this work is intended as a step towards introducing a universal, multilingual sentiment classifier.