Goto

Collaborating Authors

 Information Extraction


NSA Spy Buildings, Facebook Data, and More Security News This Week

WIRED

It has been, to be quite honest, a fairly bad week, as far as weeks go. But despite the sustained downbeat news, a few good things managed to happen as well. California has passed the strongest digital privacy law in the United States, for starters, which as of 2020 will give customers the right to know what data companies use, and to disallow those companies from selling it. It's just the latest in a string of uncommonly good bits of privacy news, which included last week's landmark Supreme Court decision in Carpenter v. US. That ruling will require law enforcement to get a warrant before accessing cell tower location data.


r/MachineLearning - [D] Don't common sentiment analysis strategies seem unsatisfying?

#artificialintelligence

There's lots of great projects in Reddit in sentiment analysis, but almost all of the work I've seen focuses on individual posts, as if tweets or reddit comments was simply a list of thumbs up and thumbs down about issues. For example, context, which doesn't seem to get much discussion. One very basic example where this is important: a Reddit comment that itself is booing a negative comment is considered negative. Of course, the nested "negative" comment should actually be counted in favor of the original topic. The relevant fields in NLP would be coreference, and possibly other subfields involving semantics.


A Quiz App Exposed 120 Million People's Facebook Data--and Cambridge Analytica Had Nothing to Do With It

Slate

Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society. The latest chapter in Facebook's data woes involves a quiz app that, until as recently as June, exposed the information of 120 million people who just wanted to know whether they were Cinderella or Elsa. According to De Ceukelaire, beginning as early as the end of 2016, NameTests collected Facebook users' data when they opted to take a quiz, such as "Which Disney Princess Are You?" The app then displayed that data--including names, birthdays, photos, and friends lists--in Javascript files easily accessible by third-party websites. De Ceukelaire writes, "Depending on what quizzes you took, the javascript could leak your Facebook ID, first name, last name, language, gender, date of birth, profile picture, cover photo, currency, devices you use, when your information was lasted updated, your posts and status, your photos and your friends." De Ceukelaire says he "would be surprised if nobody else found this earlier," since the flaw was "really easy to spot," but NameTests said it found no evidence of abuse.


Box expands Skills beta with more AI and machine learning technologies - SiliconANGLE

#artificialintelligence

Box Inc. today expanded a beta test program for its Skills software framework that uses machine learning to make video, audio, image and other files more useful on its content management service. Introduced last year, Box Skills opened up the ability to perform tasks on content such as computer vision for image analysis, video indexing and sentiment analysis from audio using machine learning technologies from IBM Corp.'s Watson, Microsoft Corp's Azure cloud and Google Cloud. Now, Box is expanding beyond the approximately 100 customers in the beta program, adding several new customers a week starting in July on top of the likes of Virgin Trains, Ancestry.com, "As we see more and more content put into Box, there's this opportunity with AI and machine learning to have more intelligence put in around the content," Box Chief Product Officer Jeetu Patel said in an interview. For instance, among the approximately 600 use cases is a large insurance company building a custom skill to label household objects in images and videos automatically in the homeowner insurance policy process.


Wavethrough Vulnerability In Microsoft Edge Could Allow Data Scraping

#artificialintelligence

We all know Microsoft has recently launched a massive'bug fix bundle' where it released patches for around 50 vulnerabilities including the patch for Cortana's Lock Screen Bypass Vulnerability. However, not many know about'all' of these vulnerabilities for which Microsoft released fixes. It was also strange that it released patches together for 50 different bugs. Seems like the team has been silently working out how to solve various issues reported to them over the past months. Now, an independent security researcher has unveiled one such issue.


The word is out. SAS leads in AI.

#artificialintelligence

SAS Visual Text Analytics uses intelligent algorithms and natural language processing (NLP) techniques to automatically extract relationships and patterns within unstructured data, therefore eliminating the need for manual analysis. The NLP tools help users in sentiment analysis, speech to text, natural language understanding and natural language generation. The Forrester report states: "SAS's brand speaks for itself as a leader in advanced analytics; as a result, SAS Visual Text Analytics comes with a number of machine learning models. Users can also leverage other capabilities of the platform, such as forecasting and optimization, to deliver predictive, prescriptive, and actionable analytics."


Harvester of Facebook Data Wants Tighter Controls Over Privacy

WSJ.com: WSJD - Technology

Sen. Jerry Moran (R., Kan.), chairman of the Senate's consumer protection subcommittee, said he was considering joining in an effort by Sen. Richard Blumenthal (D., Conn.) to pass a privacy bill of rights in Congress. His comments showed that the risks for big internet companies haven't dissipated since Facebook's scandal involving Cambridge Analytica, a political data consultancy that worked with President Donald Trump's 2016 campaign and obtained data of millions of Facebook users from an app developer, Aleksandr Kogan. Sen. John Thune (R., S.D.), the chairman of the powerful Commerce Committee, added that Facebook "remains under the microscope" and said lawmakers continue to examine potential measures to protect user privacy. But key lawmakers appeared to be far from a consensus on how to proceed. At Tuesday's hearing, Mr. Kogan, a social psychologist and University of Cambridge lecturer, in prepared testimony, called for strengthening the system of obtaining users' consent for subsequent use of their information.


Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

AAAI Conferences

We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.


Opinion Context Extraction for Aspect Sentiment Analysis

AAAI Conferences

Sentiment analysis is the computational study of opinionated text and is becoming increasing important to online commercial applications. However, the majority of current approaches determine sentiment by attempting to detect the overall polarity of a sentence, paragraph, or text window, but without any knowledge about the entities mentioned (e.g. restaurant) and their aspects (e.g. price). Aspect-level sentiment analysis of customer feedback data when done accurately can be leveraged to understand strong and weak performance points of businesses and services, and can also support the formulation of critical action steps to improve performance. In this paper we focus on aspect-level sentiment classification, studying the role of opinion context extraction for a given aspect and the extent to which traditional and neural sentiment classifiers benefit when trained using the opinion context text. We propose four methods to aspect context extraction using lexical, syntactic and sentiment co-occurrence knowledge. Further, we evaluate the usefulness of the opinion contexts for aspect-sentiment analysis. Our experiments on benchmark data sets from SemEval and a real-world dataset from the insurance domain suggests that extracting the right opinion context is effective in improving classification performance.Specifically combining syntactical features with sentiment co-occurrence knowledge leads to the best aspect-sentiment classification performance.


Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

arXiv.org Machine Learning

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unstable estimates cannot be relied upon for effective parameter tuning. The resulting tuned parameters are highly sensitive to how our data is partitioned, meaning that we often select sub-optimal parameter choices and have serious reproducibility issues. Instead, we propose to use the less variable J-K-fold CV, in which J independent K-fold cross validations are used to assess performance. Our main contributions are extending J-K-fold CV from performance estimation to parameter tuning and investigating how to choose J and K. We argue that variability is more important than bias for effective tuning and so advocate lower choices of K than are typically seen in the NLP literature, instead use the saved computation to increase J. To demonstrate the generality of our recommendations we investigate a wide range of case-studies: sentiment classification (both general and target-specific), part-of-speech tagging and document classification.