Collaborating Authors

Information Extraction

Inferring User Emotions in Texts Using SparkNLP


In this post, we demonstrate how to leverage SparkNLP and SparkML to quickly set an experiment for testing initial discovery hypotheses towards inferring emotions in short texts. We will show how to transform the input short texts, how to train a multi-class text classifier that gives reasonable results given the input data, how to evaluate and compare such text classification models, how to setup an experiment to generate multiple models and finally how evaluate the outcomes. SparkNLP (1) is provided by John Snow Labs (2) as a unified library of state-of-the-art NLP tools within the Spark environment that can be used in production. Even though the AI Research Labs at Holler Technologies develop their own proprietary AI solutions for production, we have found SparkNLP to be a valuable and useful set of resources helping the discovery process in evaluating hypothesis and ideas about text transformations and NLP. For any problem we want to solve, the discovery phase is important, as one wants to try as many possibilities as possible before deciding on any one solution to further study and implement for production. SparkNLP is one of these for NLP tasks, as it allows to transform texts and extract features ready to apply to some machine learning algorithms with a selection of state of the art technologies and pre-trained models, within the same Spark distributed environment and without having to integrate a range of technologies together. This is important as the discovery phase should enable a data scientist to quickly evaluate what works and what does not work for a particular problem and available training data. Make sure the data is in the desired language for the study, English here. Our problem is to identify a given finite set of emotions in short English texts. One approach is to solve it as an NLP multi-class text classification task, in which the classes to infer are the emotions we want to identify in the text. The input is a short English text, the output is exactly one of the 4 emotions.

Facebook will now let you export posts directly to Google Docs and WordPress


Ever wish you could easily export all your Facebook posts and notes onto a completely different platform? On Monday, Facebook announced a few new data portability options that allow you to seamlessly transition the content you've written on the social network onto platforms made for writing. Specifically, Facebook has built in an option to transfer your posts and notes into Google Docs as well as two popular blogging platforms, To give people more control and choice over their data, today we're announcing that Facebook posts and notes can be directly transferred to @GoogleDocs, @Blogger and @WordPress via our Transfer Your Information tool: Facebook already offers options to export your data to your local hard drive.

Analyzing COVID Medical Papers with Azure and Text Analytics for Health


The idea to apply NLP methods to scientific literature seems quite natural. First of all, scientific texts are already well-structured, they contain things like keywords, abstract, as well as well-defined terms. Thus, at the very beginning of COVID pandemic, a research challenge has been launched on Kaggle to analyze scientific papers on the subject. The dataset behind this competition is called CORD (publication), and it contains constantly updated corpus of everything that is published on topics related to COVID. Currently, it contains more than 400000 scientific papers, about half of them - with full text.

Ireland's data privacy agency opens investigation into Facebook data leak


Ireland's Data Protection Commission (DPC) is investigating the recent leak of a Facebook user dataset that dates back to 2019. At the start of April, it came out that someone on a hacking forum had made the dataset public, exposing the personal information of about 533 million Facebook users in 106 countries. Depending on the account, there are details about phone numbers, birth dates, email addresses, locations and more. The source of the leak is an oversight Facebook fixed in August 2019. "The DPC, having considered the information provided by Facebook Ireland regarding this matter to date, is of the opinion that one or more provisions of the GDPR and/or the Data Protection Act 2018 may have been, and/or are being, infringed in relation to Facebook Users' personal data," the agency said in a statement spotted by TechCrunch.

Facebook will not notify the half a billion users caught up in its huge data leak, it says

The Independent - Tech

Facebook will not notify the more than half a billion people caught up in a huge leak of personal information, it has said. Over the weekend, it emerged that a vast trove of data on more than 530 million users – containing information including their phone numbers and dates of birth – was being made freely available online. Facebook said that the data was gathered before 2019. It later said that " "malicious actors" had obtained the data prior to September 2019 by "scraping" profiles using a vulnerability in the platform's tool for synching contacts, and that the loophole that allowed them to do so had now been closed. But it said that it did not inform users when the leak happened, and does not have plans to do so now.

Facebook Data Breach: How To Check If You're Part Of The Leak, Preventive Measures To Take

International Business Times

Cybersecurity experts revealed a few days ago that over half a billion Facebook users' personal information have been leaked. It's a gold mine of data, which includes users' full names, birthdays, locations and phone numbers. Although Facebook claims that the actual hack happened a couple of years ago, it won't hurt if users made sure their account is not part of the breach and if they are, they should take a few preventive measures to ensure future incidents as messy as this one won't affect them. Australian Security Researcher and HaveIBeenPawned Founder Tony Hunt recently added the 533 million phone numbers exposed in the Facebook data leak to his website. Those worried if their mobile numbers were part of the leak can visit the site and check if their number is there.

What you need to know about the Facebook data leak

MIT Technology Review

The news: The personal data of 533 million Facebook users in more than 106 countries was found to be freely available online last weekend. The data trove, uncovered by security researcher Alon Gal, includes phone numbers, email addresses, home towns, full names, and birth dates. Initially, Facebook claimed that the data leak was previously reported on in 2019 and that it had patched the vulnerability that caused it that August. But in fact, it appears that Facebook did not properly disclose the breach at the time. It only finally acknowledged it on Tuesday April 6 in a blog post by product management director Mike Clark.

Table Detection, Information Extraction and Structuring using Deep Learning


The amount of data being collected is drastically increasing day-by-day with lots of applications, tools, and online platforms booming in the present technological era. To handle and access this humongous data productively, it's necessary to develop valuable information extraction tools. One of the sub-areas that's demanding attention in the Information Extraction field is the fetching and accessing of data from tabular forms. To explain this in a subtle way, imagine you have lots of paperwork and documents where you would be using tables, and using the same, you would like to manipulate data. Conventionally, you can copy them manually (onto a paper) or load them into excel sheets. However, with table extraction, no sooner have you sent tables as pictures to the computer than it extracts all the information and stacks them into a neat document. This saves an ample of time and is less erroneous. As discussed in the previous section, tables are used frequently to represent data in a clean format. We can see them so often across several areas, from organizing our work by structuring data across tables to storing huge assets of companies.

Huge Facebook leak that contains information about 500 million people came from abuse of contacts tool, company says

The Independent - Tech

Facebook says that a vast trove of personal information, uploaded freely to the internet, was harvested as part of a feature gone wrong. The data was not stolen in a hack but instead through malicious users of its "contact importer", it said. Though that feature was intended to allow people to upload their contacts from their phone to Facebook, and find people they might know, malicious actors were able to use it to scrape the personal information of people who were already on the platform. That happened before September 2019, Facebook said in a blog post, and the bug that made it possible has now been fixed. But over the weekend it became clear that the data had become availably publicly online, vastly increasing the risk that anyone involved in it might face. That includes 535 million accounts, which belong to people including chief executive Mark Zuckerberg.

How to check if your Facebook data is being shared by hackers online


At this point, there's a good chance your Facebook data has been hacked, sold, leaked, or generally misused by third parties. Now, at least in the case of the latest troubling Facebook-related incident which made the news over the weekend, there's a way to know for sure. On Tuesday, Have I Been Pwned?, a "free resource for anyone to quickly assess if they may have been put at risk due to an online account of theirs having been compromised," announced it had added to its searchable database the 533 million Facebook users' phone numbers that are being swapped around by hackers. The site, run by data breach expert Troy Hunt, lets people input their phone number to check if they're included in the scraped Facebook data set (which includes more than just phone numbers). If so, the site tells victims what was likely exposed, and what steps they can take to protect themselves.