Text Processing


Microsoft Word Spell Check

#artificialintelligence

Microsoft Word's spell check has come a long way from when a googly-eyed paper clip unfailingly assumed you were writing a letter. Now it accurately spots grammatical mistakes, autocorrects misspelled words, and alerts the user to improper spacing. The latest version even offers to catch a social error: sexism. If you scroll through the grammar settings of Microsoft Word 2016, you'll find an option titled "gender-specific language," which promises to convert the basic spell check from a simple copy editor to a fully woke word processor. Toggle it on and be amazed at the number of words that fail its exacting gender-neutral test.


Text Analysis in Excel: Real world use-cases

@machinelearnbot

Last month, we launched an Excel add-in, a solution for using ParallelDots NLP APIs to do text analysis on unstructured data without writing a single line of code. The Excel add-in is very easy to use and provides a convenient, yet effective solution for your text analysis needs. In an earlier post, we provided you with detailed information of how the excel add-in works. In this post, we will discuss some real-world use cases where you can use the Excel Add-in to raise your analytics game without spending a fortune on building a data science team. You can analyze a corpus of customer reviews to understand the general impression about your product.


DeepTriage

#artificialintelligence

For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, which takes the bug title and description as the input, mapping it to one of the available developers (class labels). The major challenge is that the bug description usually contains a combination of free unstructured text, code snippets, and stack trace making the input data highly noisy. In the past decade, there has been a considerable amount of research in representing a bug report using tf-idf based bag-of-words feature (BOW) model.


Natural Language Processing Coursera

@machinelearnbot

About this course: This course covers a wide range of tasks in Natural Language Processing from basic to advanced: sentiment analysis, summarization, dialogue state tracking, to name a few. Upon completing, you will be able to recognize NLP tasks in your day-to-day work, propose approaches, and judge what techniques are likely to work well. The final project is devoted to one of the most hot topics in today's NLP. You will build your own conversational chat-bot that will assist with search on StackOverflow website. The project will be based on practical assignments of the course, that will give you hands-on experience with such tasks as text classification, named entities recognition, and duplicates detection.


Text Mining Customer Insights from Super Bowl 50 RapidMiner

#artificialintelligence

At least 80% of enterprise data is unstructured, contained in the myriad text-based social conversations that are happening every day. Unlocking the hidden value of text through predictive analytics is imperative to the understanding of customers' opinions and needs, to make better, more informed business decisions. A whopping 90% of this data is actually completely underutilized when it comes to data strategies and data analytics techniques. It's very easy for humans to consume and make sense of unstructured data, but machines don't find it as easy. It's not like other data sources, it's not staying in the table or a database, and it's not easily referenceable.


Named Entity Recognition: Milestone Models, Papers and Technologies

@machinelearnbot

Named Entity Recognition (NER), or entity extraction is an NLP technique which locates and classifies the named entities present in the text. Named Entity Recognition classifies the named entities into pre-defined categories such as the names of persons, organizations, locations, quantities, monetary values, specialized terms, product terminology and expressions of times. Named Entity Recognition is a part of a broader field called Information Extraction. According to Wikipedia, Information Extraction is the task of automatically extracting structured information from any kind of text, structured and/or unstructured. Natural Language Processing has observed a paradigm shift in accuracy through past few years.


Question Answering from Frequently Asked Question Files

AI Magazine

For the most part, those who build these information oases have been happy to make their work freely available to the general public. Question: Is downshifting a good way to slow down my car? They tell me I should downshift when braking to slow my car down. Is this really a good idea? It used to be a very good idea, back in the days of medi... How often should I replace my brake fluid?


Sweetening WORDNET with DOLCE

AI Magazine

Example from the LOOM WORDNet Knowledge Base. At the beginning, we assumed that the hyponymy relation could simply be mapped onto the subsumption relation and that the synset notion could be mapped into the notion of concept. Both subsumption and concept have the usual description logic semantics (Woods and Schmolze 1992). LOOM WORDNET knowledge base are reported in table 1. Fig-ORDNET's noun top Under Territorial_-Dominion, we find Macao and Palestine together with Trust_Territory. The Trust_Territory synset, defined as "a dependent country, administered by a country under the supervision of United Nations," denotes a general kind of country rather than a specific country such as Macao or Palestine.


Networks and Natural Language Processing

AI Magazine

Over the last few years, a number of areas of natural language processing have begun applying graph-based techniques. These include, among others, text summarization, syntactic parsing, word-sense disambiguation, ontology construction, sentiment and subjectivity analysis, and text clustering. In this paper, we present some of the most successful graph-based representations and algorithms used in language processing and try to explain how and why they work. Since the early ages of artificial intelligence, associative or semantic networks have been proposed as representations that enable the storage of such language units and the relations that interconnect them and that allow for a variety of inference and reasoning processes, simulating some of the functionalities of the human mind. The symbolic structures that emerge from these representations correspond naturally to graphs--where text constituents are represented as vertices and their interconnecting relations form the edges in the graph.


DiversiNews: Surfacing Diversity in Online News

AI Magazine

If we want to understand an event in depth, from multiple perspectives, we need to aggregate multiple sources and understand the relations between them. However, current news aggregators do not offer this kind of functionality. As a step toward a solution, we propose DiversiNews, a real-time news aggregation and exploration platfom whose main feature is a novel set of controls that allow users to contrast reports of a selected event based on topical emphases, sentiment differences, and/or publisher geolocation. News events are presented in the form of a ranked list of articles pertaining to the event and an automatically generated summary. Both the ranking and the summary are interactive and respond in real time to user's change of controls.