Information Extraction
5-Minute Guide to Text Analytics
Did you know that text analytics is used for everything from enriching customer insights to identifying fraud to determining sentiment about products and services? The much-valued "360-degree" view of the business can't possibly exist without unstructured information. Instead, you're more likely to have a 180-degree blind spot. By automatically identifying key concepts, extracting entities, and analyzing sentiment – with multi-language support – text analytics adds structure to the unstructured so it can be added to a knowledge graph, along with the structured data. Download the 5-Minute Guide to Text Analytics to learn how cognitive solutions surface the untapped business value typically hidden in unstructured content.
Natural Language Processing: State of The Art, Current Trends and Challenges
Khurana, Diksha, Koli, Aditya, Khatter, Kiran, Singh, Sukhdev
Natural language processing (NLP) has recently gained much attention for representing and analysing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. The paper distinguishes four phases by discussing different levels of NLP and components of Natural Language Generation (NLG) followed by presenting the history and evolution of NLP, state of the art presenting the various applications of NLP and current trends and challenges.
Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings
There exist two main approaches to automatically extract affective orientation: lexicon-based and corpus-based. In this work, we argue that these two methods are compatible and show that combining them can improve the accuracy of emotion classifiers. In particular, we introduce a novel variant of the Label Propagation algorithm that is tailored to distributed word representations, we apply batch gradient descent to accelerate the optimization of label propagation and to make the optimization feasible for large graphs, and we propose a reproducible method for emotion lexicon expansion. We conclude that label propagation can expand an emotion lexicon in a meaningful way and that the expanded emotion lexicon can be leveraged to improve the accuracy of an emotion classifier.
Sentiment Analysis by Joint Learning of Word Embeddings and Classifier
Sarma, Prathusha Kameswara, Sethares, Bill
Word embeddings are representations of individual words of a text document in a vector space and they are often use- ful for performing natural language pro- cessing tasks. Current state of the art al- gorithms for learning word embeddings learn vector representations from large corpora of text documents in an unsu- pervised fashion. This paper introduces SWESA (Supervised Word Embeddings for Sentiment Analysis), an algorithm for sentiment analysis via word embeddings. SWESA leverages document label infor- mation to learn vector representations of words from a modest corpus of text doc- uments by solving an optimization prob- lem that minimizes a cost function with respect to both word embeddings as well as classification accuracy. Analysis re- veals that SWESA provides an efficient way of estimating the dimension of the word embeddings that are to be learned. Experiments on several real world data sets show that SWESA has superior per- formance when compared to previously suggested approaches to word embeddings and sentiment analysis tasks.
Sentiment Analysis: Overview, Applications and Benefits
Mining such data to determine how people feel about your product, brand, or service, is called Sentiment Analysis. When applied to social media channels, it can be used to identify spikes in sentiment, thereby allowing you to identify potential product advocates or social media influencers. Companies such as Microsoft, IBM and smaller emerging companies offer REST APIs that integrate easily with your existing software applications. For example, using the following publicly available Sentiment Analysis REST API from a small start-up called Social Opinion, we pass in the text, "this phone is awesome", to the following URL: In the response, we can see the text has been identified as expressing positive emotion, with a 64% probability of that being true.
Bringing AI to BI – Text Analytics in Azure Machine Learning
The core of the Bing News template starts with an Azure Logic App, which polls for news articles from the Bing News API at a preset schedule (5 minutes) on a list of user specified topics. As the data makes its way through the Logic App, the actual news article text is retrieved and sent through a series of Azure Functions for basic data transformation. Next, the Microsoft Text Analytics Cognitive Service is used for keyphrase and sentiment extraction over the text body. These text enrichments could alternately be performed in the Azure ML portion of the pipeline using the "Extract Key Phrases from Text" module. At this point, the data along with some basic enrichments are stored in an Azure SQL database.
You could finally control your Facebook data if UK law is passed
Britons might soon be able to request that their embarrassing social media posts be taken down and records of their existence wiped, according to new proposals outlined today. The new bill will transfer the European Union's General Data Protection Regulation into UK law, as well as making a few additions and amendments. It's currently possible to delete any of your own posts manually, but that doesn't necessarily remove the information from social media companies' databases. According to Facebook's terms and conditions, "some things can only be deleted when you permanently delete your account." While not all requests for deletion will be granted – companies can decline on the grounds of freedom of expression, and when the information of scientific or historical importance – those involving information posted by or collected from children will nearly always be honoured.
Doing text analytics for Digital Humanities and Social Sciences with CLARIN (LDK tutorial), Galway 2017
Text is a basic material, a primary data layer, in many areas of humanities and social sciences. If we want to move forward with the agenda that the fields of digital humanities and computational social sciences are projecting, it is vital to bring together the technical areas that deal with automated text processing, and scholars in the humanities and social sciences. Much progress has been made in the last two decades in text analytics, a field that draws on recent advances in computational linguistics, information retrieval and machine learning. By now we know what to expect from basic tools, such as named entity recognition. To foster new areas of research, it is necessary to not only understand what is out there in terms of proven technologies and infrastructures such as CLARIN, but also how the developers of text analytics can work with researchers in the humanities and social sciences to understand the challenges in each other's field better.
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities
One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.
Sentiment Analysis: Overview, Applications and Benefits
When experimenting with machine learning and big data, you may identify data sets that contain streams of text that contain customer reviews, or social media posts where customers (or potential customers) are talking about a product, brand or service that you offer. Mining such data to determine how people feel about your product, brand, or service, is called Sentiment Analysis. People have always had an interest in what people think, or what their opinion is. Since the inception of the internet, increasing numbers of people are using websites and services to express their opinion. With social media channels such as Facebook, LinkedIn, and Twitter, it is becoming feasible to automate and gauge what public opinion is on a given topic, news story, product, or brand.