Goto

Collaborating Authors

 Information Extraction


The Best 25 Datasets for Natural Language Processing Gengo AI

#artificialintelligence

Where's the best place to look for free online datasets for NLP? We combed the web to create the ultimate cheat sheet, broken down into datasets for text, audio speech, and sentiment analysis. Sentiment140: a popular dataset, which uses 160,000 tweets with emoticons pre-removed. Twitter US Airline Sentiment: Twitter data on US airlines from February 2015, classified as positive, negative, and neutral tweets. Yelp Reviews: An open dataset released by Yelp, contains more than 5 million reviews.


Facebook hits back after Germany restricts its data collection

Daily Mail - Science & tech

German anti-trust authorities have ruled against Facebook over its methods of combining user data from different sources, including WhatsApp and Instagram. The Federal Cartel Office, or Bundeskartellamt said the firm was exploiting its position as a dominant social media company in violation of European regulations. Facebook, however, says this is not the case. In a statement issued after the ruling, Facebook said its data use is in compliance with GDPR and is intended to'protect people's safety and security.' German anti-trust authorities have ruled against Facebook over its methods of combining user data from different sources, including WhatsApp and Instagram.


Germany to restrict Facebook's data gathering activities

Al Jazeera

Germany has ordered Facebook to curb its data collection practices in the country, after a ruling that the world's largest social media network abused its market dominance to gather information about users without their knowledge or consent. "In future, Facebook will no longer be allowed to force its users to agree to the practically unrestricted collection and assigning of non-Facebook data to their Facebook accounts," Federal Cartel Office chief Andreas Mundt said in the landmark order on Thursday. Facebook said it would appeal the ruling, the culmination of a three-year probe, saying the antitrust watchdog underestimated the competition it faced and undermined Europe-wide privacy rules that took effect last year. The findings follow fierce global scrutiny of Facebook over a series of privacy lapses, including the leak of data on tens of millions of users, as well as the extensive use of targeted ads by foreign powers seeking to influence elections in the United States. These have gone down badly with Germans, reflecting broader concerns over personal surveillance that dates back to Germany's history of Nazi and Communist rule in the 20th century.


Germany lets users decide if Facebook can merge their WhatsApp and Instagram data

Engadget

Germany is known for its no-nonsense approach to digital data gathering -- back in 2016 it took a hard stance against Facebook's use of WhatsApp data, and more recently announced its plans to investigate the Google data exposure. Now, its anti-trust watchdog Bundeskartellamt has ordered a crackdown on Facebook's data combination practices in a landmark ruling that could have wide-ranging repercussions for the social network. As it currently stands, Facebook users are only able to use to platform under the condition that Facebook can also collect their user data outside of the network. This data comes from the company's own services, such as WhatsApp and Instagram, and from third-party websites with embedded Facebook like and share buttons -- and even on pages where there's no obvious sign the company is present. The Bundeskartellamt claims that this adds up to an abuse of market dominance, and in its ruling stipulates that Facebook may continue to collect this data only if users give their voluntary consent.


Facebook banned from mixing up WhatsApp and Instagram data by Germany

The Independent - Tech

Facebook can't mix up WhatsApp and Instagram data with its own unless people consent to it, Germany has said. It has also been told to stop taking data from third-party apps and combining that with its own. Facebook collects information from across the internet โ€“ including third-party sources as well as its own other apps โ€“ and attaches it to users' accounts, in an attempt to build a more accurate picture about them and sell ads. That practise has come under intense criticism, from privacy campaigners and users who argue that it amounts to Facebook following them around the internet without their consent or event knowledge. The company has abused its market dominance to combine user data from a range of different sources, according to the German cartel office, the Bundeskartellamt.


Facebook's Data Gathering Hit by German Anti-Trust Clampdown

U.S. News

As part of complying with the GDPR, Facebook said it had rebuilt the information its provides people about their privacy and the controls they have over their information, and improved the privacy'choices' that they are offered. It would also soon launch a'clear history' feature.



German Minister Welcomes Cartel Office's Clampdown on Facebook Data Collection

U.S. News

BERLIN (Reuters) - German Justice Minister Katarina Barley on Thursday welcomed a crackdown by Germany's antitrust watchdog on Facebook's data collection practices, saying the company was collecting data far beyond its platform.


Multi-task Learning for Target-dependent Sentiment Classification

arXiv.org Machine Learning

Detecting and aggregating sentiments toward people, organizations, and events expressed in unstructured social media have become critical text mining operations. Early systems detected sentiments over whole passages, whereas more recently, target-specific sentiments have been of greater interest. In this paper, we present MTTDSC, a multi-task target-dependent sentiment classification system that is informed by feature representation learnt for the related auxiliary task of passage-level sentiment classification. The auxiliary task uses a gated recurrent unit (GRU) and pools GRU states, followed by an auxiliary fully-connected layer that outputs passage-level predictions. In the main task, these GRUs contribute auxiliary per-token representations over and above word embeddings. The main task has its own, separate GRUs. The auxiliary and main GRUs send their states to a different fully connected layer, trained for the main task. Extensive experiments using two auxiliary datasets and three benchmark datasets (of which one is new, introduced by us) for the main task demonstrate that MTTDSC outperforms state-of-the-art baselines. Using word-level sensitivity analysis, we present anecdotal evidence that prior systems can make incorrect target-specific predictions because they miss sentiments expressed by words independent of target.


Reading between the lines

#artificialintelligence

Are the trains running on time today or might the bus be quicker? Is that new restaurant near work any good for lunch? Is the latest blockbuster movie worth the price of a ticket? Sarcasm is very hard for computers to detect as sarcastic comments use many of the same words and language structures as positive comments. Above is a map that the Crystalace team have produced.