Collaborating Authors

Information Extraction

How AI and ML Applications Will Benefit from Vector Processing


As expected, artificial intelligence (AI) and machine learning (ML) applications are already having an impact on society. Many industries that we tap into daily--such as banking, financial services and insurance (BFSI), and digitized health care--can benefit from AI and ML applications to help them optimize mission-critical operations and execute functions in real time. The BFSI sector is an early adopter of AI and ML capabilities. Natural language processing (NLP) is being implemented for personal identifiable information (PII) privacy compliance, chatbots and sentiment analysis; for example, mining social media data for underwriting and credit scoring, as well as investment research. Predictive analytics assess which assets will yield the highest returns.

Python Libraries for Natural Language Processing


Natural Language Processing is considered one of the many critical aspects of making intelligent systems. By training your solution with data gathered from the real-world, you can make it faster and more relevant to users, generating crucial insight about your customer base. In this article, we will be taking a look at how Python offers some of the most useful and powerful libraries for leveraging the power of Natural Language Processing into your project and where exactly do they fit in. Often recognized as a professional-grade Python library for advanced Natural Language Processing, spaCy excels at working with incredibly large-scale information extraction tasks. Built using Python and Cython, spaCy combines the best of both languages, the convenience from Python and the speed from Cython to deliver one of the best-in-class NLP experiences. Stanford CoreNLP is a suite of tools built for implementing a Natural Language Processing into your project.

The (Un)ethical Story of GPT-3: OpenAI's Million Dollar Model


Back on October 12, 2019, the world witnessed a previously unimaginable accomplishment- the first sub-two-hour marathon was run in an incredible time of 1:59:40 by Kenyan native Eliud Kipchoge. He would later say in regards to the amazing achievement that he "expected more people all over the world to run under 2 hours after today" [1]. While Kipchoge set new records in long distance running, across the world a team of natural language processing (NLP) experts at OpenAI, the Elon Musk-backed AI firm, published a new transformer-based language model with 1.5 billion parameters that achieved previously unthinkable performance in nearly every language task it faced [2]. The main takeaway from the paper by many experts was that bigger is better-the intelligence of transformer models can dramatically increase with the scale of parameters. In March of 2020, this theory gained support with OpenAI's release of version three of the model or GPT-3 which encapsulates a staggering 175 billion parameters and achieved even more remarkable performance than version 2, despite sharing, quite literally, the same architecture [3].

Basic Sentiment Analysis with TensorFlow


Basic Sentiment Analysis with TensorFlow Welcome to this project-based course on Basic Sentiment Analysis with TensorFlow. Welcome to this project-based course on Basic Sentiment Analysis with TensorFlow. In this project, you will learn the basics of using Keras with TensorFlow as its backend and you will learn to use it to solve a basic sentiment analysis problem. By the end of this 2-hour long project, you will have created, trained, and evaluated a Neural Network model that, after the training, will be able to predict movie reviews as either positive or negative reviews – classifying the sentiment of the review text. Welcome to this project-based course on Basic Sentiment Analysis with TensorFlow.

Chat analysis on WhatsApp: Part 2 -- Sentiment analysis and Data visualization with R


Having understood the context and starting point, now we will go a little further with the interaction between our two individuals and their open relationship (still maintaining their anonymity, of course, as "Él" (He) and "Ella" (She)), analyzing the diversity of vocabulary and performing sentiment analysis based on the expressed emojis. Okay, so going back, using the same libraries, same defined variables, and the same txt file so far, let's continue. You will remember that in the first part, using the stopwords() function, we discriminate the words whose meaning is little or nothing relevant. Based on this and looking for words that are repeated only by the same user, we can measure the diversity of vocabulary. So we will obtain as a result the following plot where we can see that She is the one who has the greatest diversity of lexicon.

Data-Powered Opinion Mining Is The Next Big Thing For Customer Satisfaction


Arvind Gopalakrishnan is a part of the AIM Writers Programme.… Data mining is taking turns in the industry like anything, but have you ever heard of Opinion Mining? Leveraging customer opinion as quantifiable data is a concept of future to a layman but with Natural Language Processing, the world can finally process and completely absorb customer feedback. Often data is associated with quantity-based statistics with numbers and metrics floating around, however, with natural language processing (NLP), qualitative factors like customer feedback can be processed and used as quantifiable data. For example, if a specific mobile phone models witness a higher number of sales in a given year, the manufacturers tend to incorporate features of that mobile phone to increase the sales of other models where they somehow miss to make upgrades properly basis the customer feedback.

Here's How I Predicted Apple's Stock Price Using Natural Language Processing


Stock market prediction refers to the act of attempting to determine the future value of a company's stock (or other financial instruments) that is traded on an exchange. Accurately predicting the stock market is like being able to see into the future. If one could do this, then they will undoubtedly engage in actions that will substantially benefit themselves. Imagine knowing that Apple's stock will increase from $300 per share by 80% tomorrow, and currently having the ability to buy 10 shares. That will guarantee a return of $2,400 in one day with possibly minimal effort.

Trade groups offering $100,000 reward after noose found at Facebook data center

USATODAY - Tech Top Stories

The FBI and Justice Department are assisting the Altoona Police Department's investigation after a noose was found last month at a work site on the Facebook Data Center property in Altoona, Iowa. Altoona police officials say they contacted the FBI on June 19, the day the noose was found. The date coincided with Juneteenth, the annual holiday celebrating the end of slavery. Interviews are still being conducted in the investigation, according to Altoona Police Department Public Information Officer Alyssa Wilson. While federal investigators were already involved with the incident, as of Thursday, all information in the case will be filtered through the FBI's Omaha office.

Sentiment Analysis -- from Scratch to Production (Web API)


It is stated that data scientists spend almost 70% of their time on data cleaning. It is one of the most tedious tasks. The model's performance is directly proportional to how clean your data is. Here cleaning includes removing duplicate data, unnecessary elements, and handling missing data. We will perform a couple of standard cleaning techniques before we preprocess the text.

Roadmap to Natural Language Processing (NLP)


Natural Language Processing (NLP) is the area of research in Artificial Intelligence focused on processing and using Text and Speech data to create smart machines and create insights. One of nowadays most interesting NLP application is creating machines able to discuss with humans about complex topics. IBM Project Debater represents so far one of the most successful approaches in this area. All of these preprocessing techniques can be easily applied to different types of texts using standard Python NLP libraries such as NLTK and Spacy. Additionally, in order to extrapolate the language syntax and structure of our text, we can make use of techniques such as Parts of Speech (POS) Tagging and Shallow Parsing (Figure 1).