Information Extraction
How to Detect Depression Using Natural Language Processing
TextBlob is more of a natural language processing library, but it comes with a rule-based sentiment analysis library. Polarity is a float that lies in the range of [-1,1] where 1 means a positive statement and -1 means a negative statement. Subjective sentences generally refer to personal opinion, emotion, or judgment whereas objective refers to factual information. VADER (Valence Aware Dictionary and Sentiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. The compound score is the sum of positive, negative & neutral scores which is then normalized between -1(most extreme negative) and 1 (most extreme positive).
Making the most of sentiment analysis
Sentiment analysis is beginning to prove its mettle in the enterprise. This analytic technique, which enables companies to determine the emotional value of communications, is finding traction in a range of use cases, from meeting transcription to customer service and feedback. These days, sentiment analysis relies largely on supervised or semi-supervised machine learning algorithms. All the big cloud players offer sentiment analysis tools, as do most major customer support platforms and marketing vendors. Conversational AI vendors also include sentiment analysis features in their wares. Get the latest insights with our CIO Daily newsletter.
Emoji-based Co-attention Network for Microblog Sentiment Analysis
Yuan, Xiaowei, Hu, Jingyuan, Zhang, Xiaodan, Lv, Honglei, Liu, Hao
Emojis are widely used in online social networks to express emotions, attitudes, and opinions. As emotional-oriented characters, emojis can be modeled as important features of emotions towards the recipient or subject for sentiment analysis. However, existing methods mainly take emojis as heuristic information that fails to resolve the problem of ambiguity noise. Recent researches have utilized emojis as an independent input to classify text sentiment but they ignore the emotional impact of the interaction between text and emojis. It results that the emotional semantics of emojis cannot be fully explored. In this paper, we propose an emoji-based co-attention network that learns the mutual emotional semantics between text and emojis on microblogs. Our model adopts the co-attention mechanism based on bidirectional long short-term memory incorporating the text and emojis, and integrates a squeeze-and-excitation block in a convolutional neural network classifier to increase its sensitivity to emotional semantic features. Experimental results show that the proposed method can significantly outperform several baselines for sentiment analysis on short texts of social media.
BioIE: Biomedical Information Extraction with Multi-head Attention Enhanced Graph Convolutional Network
Wu, Jialun, Liu, Yang, Gao, Zeyu, Gong, Tieliang, Wang, Chunbao, Li, Chen
Constructing large-scaled medical knowledge graphs can significantly boost healthcare applications for medical surveillance, bring much attention from recent research. An essential step in constructing large-scale MKG is extracting information from medical reports. Recently, information extraction techniques have been proposed and show promising performance in biomedical information extraction. However, these methods only consider limited types of entity and relation due to the noisy biomedical text data with complex entity correlations. Thus, they fail to provide enough information for constructing MKGs and restrict the downstream applications. To address this issue, we propose Biomedical Information Extraction, a hybrid neural network to extract relations from biomedical text and unstructured medical reports. Our model utilizes a multi-head attention enhanced graph convolutional network to capture the complex relations and context information while resisting the noise from the data. We evaluate our model on two major biomedical relationship extraction tasks, chemical-disease relation and chemical-protein interaction, and a cross-hospital pan-cancer pathology report corpus. The results show that our method achieves superior performance than baselines. Furthermore, we evaluate the applicability of our method under a transfer learning setting and show that BioIE achieves promising performance in processing medical text from different formats and writing styles.
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis
Liu, Juhua, Zhong, Qihuang, Ding, Liang, Jin, Hua, Du, Bo, Tao, Dacheng
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect. Because of the expensive and limited labelled data, the pretraining strategy has become the de-facto standard for ABSA. However, there always exists severe domain shift between the pretraining and downstream ABSA datasets, hindering the effective knowledge transfer when directly finetuning and making the downstream task performs sub-optimal. To mitigate such domain shift, we introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline with both instance- and knowledge-level alignments. Specifically, we first devise a novel coarse-to-fine retrieval sampling approach to select target domain-related instances from the large-scale pretraining dataset, thus aligning the instances between pretraining and target domains (\textit{First Stage}). Then, we introduce a knowledge guidance-based strategy to further bridge the domain gap at the knowledge level. In practice, we formulate the model pretrained on the sampled instances into a knowledge guidance model and a learner model, respectively. On the target dataset, we design an on-the-fly teacher-student joint fine-tuning approach to progressively transfer the knowledge from the knowledge guidance model to the learner model (\textit{Second Stage}). Thereby, the learner model can maintain more domain-invariant knowledge when learning new knowledge from the target dataset. In the \textit{Third Stage,} the learner model is finetuned to better adapt its learned knowledge to the target dataset. Extensive experiments and analyses on several ABSA benchmarks demonstrate the effectiveness and universality of our proposed pretraining framework. Notably, our pretraining framework pushes several strong baseline models up to the new state-of-the-art records. We release our code and models.
CoVA: Context-aware Visual Attention for Webpage Information Extraction
Kumar, Anurendra, Morabia, Keval, Wang, Jingjin, Chang, Kevin Chen-Chuan, Schwing, Alexander
Webpage information extraction (WIE) is an important step to create knowledge bases. For this, classical WIE methods leverage the Document Object Model (DOM) tree of a website. However, use of the DOM tree poses significant challenges as context and appearance are encoded in an abstract manner. To address this challenge we propose to reformulate WIE as a context-aware Webpage Object Detection task. Specifically, we develop a Context-aware Visual Attention-based (CoVA) detection pipeline which combines appearance features with syntactical structure from the DOM tree. To study the approach we collect a new large-scale dataset of e-commerce websites for which we manually annotate every web element with four labels: product price, product title, product image and background. On this dataset we show that the proposed CoVA approach is a new challenging baseline which improves upon prior state-of-the-art methods.
Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research
Gruetzemacher, Ross, Paradice, David
AI is widely thought to be poised to transform business, yet current perceptions of the scope of this transformation may be myopic. Recent progress in natural language processing involving transformer language models (TLMs) offers a potential avenue for AI-driven business and societal transformation that is beyond the scope of what most currently foresee. We review this recent progress as well as recent literature utilizing text mining in top IS journals to develop an outline for how future IS research can benefit from these new techniques. Our review of existing IS literature reveals that suboptimal text mining techniques are prevalent and that the more advanced TLMs could be applied to enhance and increase IS research involving text data, and to enable new IS research topics, thus creating more value for the research community. This is possible because these techniques make it easier to develop very powerful custom systems and their performance is superior to existing methods for a wide range of tasks and applications. Further, multilingual language models make possible higher quality text analytics for research in multiple languages. We also identify new avenues for IS research, like language user interfaces, that may offer even greater potential for future IS research.
Real-Time Stock News Sentiment Analyzer
Investing in the Stock Market is a great way of tackling Inflation. Inflation refers to the rise in the prices of most goods and services of daily or common use, such as food, clothing, housing, recreation, transport, consumer staples, etc. Basically, with 100 rupees you won't be able to buy as many vada pavs (wadapavs) as you could last year. In the pandemic-struck financial year of 2020–2021 a whopping 142 lakh new investors have started trading in the stock market. One key skill required to make good investments in the stock market is being able to correctly analyze news related to the finance and the business sector. Which company is diversifying its sectors or which company is showing signs of heading towards bankruptcy?