text processing


Parsa Mirhaji Montefiore Health System - PMWC Precision Medicine World Conference

#artificialintelligence

Dr. Mirhaji was the former director of the Center for Biosecurity and Public Health Informatics Research at the University of Texas at Houston where he developed clinical text understanding, semantic information integration, and EMR interoperability solutions, for public health and disaster preparedness. He is an inventor with several patents covering information integration, biomedical vocabularies and taxonomy services, clinical text understanding and natural language processing, electronic data capture, and knowledge-based information retrieval. Dr. Mirhaji and his fellow researchers were awarded "The Best Practice in Public Health. He is a member of W3C working groups for application of Semantic Technologies in Healthcare and Life Sciences, and organizer and committee member for several national and international conferences on Bio-Ontologies and Semantic Technologies.


r/MachineLearning - [Discussion] What is the state-of-the-art for entity extraction and relation extraction?

#artificialintelligence

I am looking for the state-of-the-art entity extraction/relation extraction algorithms that are practical to implement and use for commercial information extraction. Mr. Wilken works at Foobar, Inc (transitively he is then the CEO of Foobar, Inc.). In my experience I've used CRF used hand crafted features for entity tagging followed by a classifier to determine relations between entities use hand crafted features. This is a pretty old school approach and does not leverage any of the advances in word embeddings (Glove, BERT, etc.). I know there are also methods for doing joint entity relation extraction.


Compare documents similarity using Python

#artificialintelligence

In this post we are going to build a web application which will compare the similarity between two documents. We will learn the very basics of natural language processing (NLP) which is a branch of artificial intelligence that deals with the interaction between computers and humans using the natural language. Let's start with the base structure of program but then we will add graphical interface to making the program much easier to use. Feel free to contribute this project in my GitHub. Natural language toolkit (NLTK) is the most popular library for natural language processing (NLP) which was written in Python and has a big community behind it.


MonkeyLearn - Text Analysis

#artificialintelligence

"MonkeyLearn is one of the most innovative and compelling platforms I've used. I've also loved working with MonkeyLearn's team - their willingness to help me build great products to help our community have put them among my favorite new companies."


Automated text analysis: The next frontier of marketing innovation

#artificialintelligence

Researchers from University of Pennsylvania, Northwestern University, University of Maryland, Columbia University, and Emory University published a new article in the Journal of Marketing that provides an overview of automated textual analysis and describes how it can be harnessed to generate marketing insights. The study, forthcoming in the January issue of the Journal of Marketing, is titled "Uniting the Tribes: Using Text for Marketing Insights" and authored by Jonah Berger, Ashlee Humphreys, Wendy Moe, Oded Netzer, and David Schweidel. Online reviews, customer service calls, press releases, news articles, marketing communications, and other interactions create a wealth of textual data companies can analyze to optimize services and develop new products. By some estimates, 80-95% of all business data is unstructured, with most of that being text. This text has the potential to provide critical insights about its producers, including individuals' identities, their relationships, their goals, and how they display key attitudes and behaviors.


What is Text Analytics? - Compare Reviews, Features, Pricing in 2019 - PAT RESEARCH: B2B Reviews, Buying Guides & Best Practices

#artificialintelligence

Text Analytics is the process of converting unstructured text data into meaningful data for analysis, to measure customer opinions, product reviews, feedback, to provide search facility, sentimental analysis and entity modeling to support fact based decision making. Text analysis uses many linguistic, statistical, and machine learning techniques. Text Analytics involves information retrieval from unstructured data and the process of structuring the input text to derive patters and trends and evaluating and interpreting the output data. It also involves lexical analysis, categorization, clustering, pattern recognition, tagging, annotation, information extraction, link and association analysis, visualization, and predictive analytics. Text Analytics determines key words, topics, category, semantics, tags from the millions of text data available in an organization in different files and formats.


Conditional Random Fields Explained

#artificialintelligence

Conditional Random Fields is a class of discriminative models best suited to prediction tasks where contextual information or state of the neighbors affect the current prediction. CRFs find their applications in named entity recognition, part of speech tagging, gene prediction, noise reduction and object detection problems, to name a few. In this article, I will first introduce the basic math and jargon related to Markov Random Fields which is an abstraction CRF is built upon. I will then introduce and explain a simple Conditional Random Fields model in detail which will show why are they suited well to sequential prediction problems. After that, I will go over the likelihood maximization problem and related derivations in context of that CRF model.


Getting Started with Text Preprocessing for Machine Learning & NLP

#artificialintelligence

Based on some recent conversations, I realized that text preprocessing is a severely overlooked topic. A few people I spoke to mentioned inconsistent results from their NLP applications only to realize that they were not preprocessing their text or were using the wrong kind of text preprocessing for their project. With that in mind, I thought of shedding some light around what text preprocessing really is, the different techniques of text preprocessing and a way to estimate how much preprocessing you may need. For those interested, I've also made some text preprocessing code snippets in python for you to try. To preprocess your text simply means to bring your text into a form that is predictable and analyzable for your task. A task here is a combination of approach and domain.


Teach Einstein Bots to Pay Attention with Named Entity Recognition

#artificialintelligence

One of the most impactful ways that bots can improve conversations is to pick up on the important details you've mentioned and reference them later without asking you to repeat things. Imagine if you called up an airline and said that you want to book a flight to Hawaii. If the airline employee were to reply with "Happy to book you a flight, where would you like to go?" you'd begin to question whether they were paying attention. Few bots have been set up to do this well and as a consequence run the risk of delivering a slow, rigid, repetitive experience. Automated natural language systems are notoriously bad at handling these important details and fail to deliver a natural and brief conversation without redundant messages.


Review: BioBERT paper

#artificialintelligence

The objective of this article is to understand the application of BERT pre-trained model for biomedical field and then try to figure out various parameters which can help it in adapting to other business verticals. I would assume you have prior knowledge about BERT, if this is the first time you are hearing this word, I would suggest reading an excellent blog on this topic to develop the intuition. Also, reading the original BERT paper would help you to get a deeper understanding. BioBERT paper is from the researchers of Korea University & Clova AI research group based in Korea. The major contribution is a pre-trained bio-medical language representation model for various bio-medical text mining tasks.