Text Processing


Applied Text Analysis with Python

#artificialintelligence

From news and speeches to informal chatter on social media, natural language is one of the richest and most underutilized sources of data. Not only does it come in a constant stream, always changing and adapting in context; it also contains information that is not conveyed by traditional data sources. The key to unlocking natural language is through the creative application of text analytics. This practical book presents a data scientist's approach to building language-aware products with applied machine learning. You'll learn robust, repeatable, and scalable techniques for text analysis with Python, including contextual and linguistic feature engineering, vectorization, classification, topic modeling, entity resolution, graph analysis, and visual steering.


Make Wordclouds in R and RapidMiner

#artificialintelligence

Learn how to use the wordcloud package in R with RapidMiner to generate a cool wordcloud. Do all your text processing in RapidMiner too!


Efficient Graph-based Word Sense Induction

#artificialintelligence

The paper was first presented at TextGraphs-2018, a workshop series at The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) on June 6, 2018 in New Orleans. This new approach to word-sense induction comes from the work of the Lexalytics Magic Machines AI Labs, launched in 2017 in partnership with the University of Massachusetts Amherst's Center for Data Science and Northwestern University's Medill School of Journalism, Media and Integrated Marketing Communications to drive innovation in AI. Word sense induction (WSI) is a challenging task of natural language processing whose goal is to categorize and identify multiple senses of polysemous words from raw text without the help of predefined sense inventory like WordNet (Miller, 1995). The problem is sometimes also called unsupervised word sense disambiguation (Agirre et al., 2006; Pelevina et al., 2016). An effective WSI has wide applications.


Applying Azure Text Analysis

#artificialintelligence

I am new to Azure Text Analysis and am trying write my first application. I'm trying to get a clear understanding of how this api works but am confused by the responses returned by the demo. For example, I initially typed in the words "I want an appointment" returns the word "appointment" as the sole key phrase. I then typed in the term "I want an appointment for next Thursday". I expected the words "appointment" and "Thursday" to be the key phrases returned, but only received the word "appointment" as the sole key phrase, no different from the first example.


Text Analysis in Power BI with Cognitive services with Leila Etaati

#artificialintelligence

Abstract: Data that we collected always is not about numbers and structured data. In any organization, there is a need to analyze the text data such as customer comments, extract the primary purpose of a call from its scripts, detect the language of customer feedback and translate it and so forth. To address this issue, Microsoft Cognitive Services provides a set of APIs, SDKs, and services available to developers to do text analysis without writing R or Python codes. In this session, I will explain what is text analysis such as sentiment analysis, key phrase extraction, Language detection and so forth. Next, the process of text analysis in Power BI using cognitive services will be demonstrated.


Is it ok to get negative Cosine Similarity using LSA?

#artificialintelligence

Cosine similarities could be negative for sure. If you're trying to interpret this (not that you think it's problematic to have negative values), then I think it means these two documents are talking about opposite things.


Semantic Indexing: Google's Big Data Trick For Multilingual Search Results

#artificialintelligence

Google has perfected its ability to execute web search results for its users all over the world. In the early days of the Internet, the search engine was primarily suited for displaying search results for English users. Non-English-speaking users have complained that search results are often displayed in the wrong language entirely. However, Google is becoming more proficient at providing search results in other languages as well. A lot of factors can play a role, but one of the biggest is its use of deep learning to understand semantic references--enter semantic indexing.


Semantic Technologies Are Steering Cognitive Applications

#artificialintelligence

Cognitive applications are being applied to a wide variety of uses and across various industries. Based on statistical and rule-based methods, they are excellent to process a large volume of information. But many companies are battling with the imprecise results this technology delivers. Complex algorithms to simulate how the human brain works lead data scientists to a bottleneck for taking cognitive computing to the next level.


Multimodal Grounding for Language Processing

arXiv.org Artificial Intelligence

This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.


Measuring Semantic Coherence of a Conversation

arXiv.org Artificial Intelligence

Conversational systems have become increasingly popular as a way for humans to interact with computers. To be able to provide intelligent responses, conversational systems must correctly model the structure and semantics of a conversation. We introduce the task of measuring semantic (in)coherence in a conversation with respect to background knowledge, which relies on the identification of semantic relations between concepts introduced during a conversation. We propose and evaluate graph-based and machine learning-based approaches for measuring semantic coherence using knowledge graphs, their vector space embeddings and word embedding models, as sources of background knowledge. We demonstrate how these approaches are able to uncover different coherence patterns in conversations on the Ubuntu Dialogue Corpus.