Information Extraction
Brooklyn Nine-Nine Meets Data Science
This job [Data Scientist] is eating me alive. I spent all these years trying to be the good guy, the man in the white hat. I'm not becoming like them… I am them -- Jake Peralta, Pilot I recently binge-watched a show on Netflix called Brooklyn Nine-Nine and I really enjoyed it. As I eagerly await the release of the next season, I thought it'd be fun to perform exploratory data analysis and sentiment analysis on the pilot episode. I found the script online and extracted the text into CSV file format.
Simultaneous Identification of Tweet Purpose and Position
Iyer, Rahul Radhakrishnan, Pei, Yulong, Sycara, Katia
Tweet classification has attracted considerable attention recently. Most of the existing work on tweet classification focuses on topic classification, which classifies tweets into several predefined categories, and sentiment classification, which classifies tweets into positive, negative and neutral. Since tweets are different from conventional text in that they generally are of limited length and contain informal, irregular or new words, so it is difficult to determine user intention to publish a tweet and user attitude towards certain topic. In this paper, we aim to simultaneously classify tweet purpose, i.e., the intention for user to publish a tweet, and position, i.e., supporting, opposing or being neutral to a given topic. By transforming this problem to a multi-label classification problem, a multi-label classification method with post-processing is proposed. Experiments on real-world data sets demonstrate the effectiveness of this method and the results outperform the individual classification methods.
More than 267 millions of Facebook user phone numbers exposed online
Security expert Bob Diachenko, along with Comparitech, has discovered more than 267 million Facebook user IDs, phone numbers and names in an unsecured database. The huge trove of data is likely the result of an illegal scraping operation or Facebook API abuse by a group of hackers in Vietnam. The exposed data could be used by threat actors to conduct large-scale SMS spam and phishing campaigns. "A database containing more than 267 million Facebook user IDs, phone numbers, and names was left exposed on the web for anyone to access without a password or any other authentication." "Comparitech partnered with security researcher Bob Diachenko to uncover the Elasticsearch cluster.
GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader Perception
Bostan, Laura, Kim, Evgeny, Klinger, Roman
Most research on emotion analysis from text focuses on the task of emotion classification or emotion intensity regression. Fewer works address emotions as structured phenomena, which can be explained by the lack of relevant datasets and methods. We fill this gap by releasing a dataset of 5000 English news headlines annotated via crowdsourcing with their dominant emotions, emotion experiencers and textual cues, emotion causes and targets, as well as the reader's perception and emotion of the headline. We propose a multiphase annotation procedure which leads to high quality annotations on such a task via crowdsourcing. Finally, we develop a baseline for the task of automatic prediction of structures and discuss results. The corpus we release enables further research on emotion classification, emotion intensity prediction, emotion cause detection, and supports further qualitative studies.
A Heterogeneous Graphical Model to Understand User-Level Sentiments in Social Media
Iyer, Rahul Radhakrishnan, Chen, Jing, Sun, Haonan, Xu, Keyang
Social Media has seen a tremendous growth in the last decade and is continuing to grow at a rapid pace. With such adoption, it is increasingly becoming a rich source of data for opinion mining and sentiment analysis. The detection and analysis of sentiment in social media is thus a valuable topic and attracts a lot of research efforts. Most of the earlier efforts focus on supervised learning approaches to solve this problem, which require expensive human annotations and therefore limits their practical use. In our work, we propose a semi-supervised approach to predict user-level sentiments for specific topics. We define and utilize a heterogeneous graph built from the social networks of the users with the knowledge that connected users in social networks typically share similar sentiments. Compared with the previous works, we have several novelties: (1) we incorporate the influences/authoritativeness of the users into the model, 2) we include comment-based and like-based user-user links to the graph, 3) we superimpose multiple heterogeneous graphs into one thereby allowing multiple types of links to exist between two users.
How AI is Making Sentiment Analysis Easy
It's a far more complex way of analyzing how consumers feel about our products and services, using not just simple words but longer sentence fragments. Yes, AI is becoming smart enough to understand the tone of a statement, rather than simply understanding whether certain words within a group of text have a positive or negative connotation. This is incredibly impactful for companies seeking to optimize their message, improve customer engagement, or even identify top influencers in their customer base. The possibilities of sentiment analysis are incredibly far-reaching. The types of information that AI can gather from both unstructured data and affective computing in sentiment analysis are huge.
Na\"iveRole: Author-Contribution Extraction and Parsing from Biomedical Manuscripts
Tkaczyk, Dominika, Collins, Andrew, Beel, Joeran
Information about the contributions of individual authors to scientific publications is important for assessing authors' achievements. Some biomedical publications have a short section that describes authors' roles and contributions. It is usually written in natural language and hence author contributions cannot be trivially extracted in a machine-readable format. In this paper, we present 1) A statistical analysis of roles in author contributions sections, and 2) Na\"iveRole, a novel approach to extract structured authors' roles from author contribution sections. For the first part, we used co-clustering techniques, as well as Open Information Extraction, to semi-automatically discover the popular roles within a corpus of 2,000 contributions sections from PubMed Central. The discovered roles were used to automatically build a training set for Na\"iveRole, our role extractor approach, based on Na\"ive Bayes. Na\"iveRole extracts roles with a micro-averaged precision of 0.68, recall of 0.48 and F1 of 0.57. It is, to the best of our knowledge, the first attempt to automatically extract author roles from research papers. This paper is an extended version of a previous poster published at JCDL 2018.
LexNLP: Natural language processing and information extraction for legal and regulatory texts
By accepting the Deed and closing the Transaction, Buyer, on behalf of itself and its successors and assigns, shall thereby release each of the Seller Parties from, and waive any and all Liabilities against each of the Seller Parties for, attributable to, or in connection with the Property, whether arising or accruing before, on or after the Closing and whether attributable to events or circumstances which arise or occur before, on or after the Closing, including, without limitation, the following: (a) any and all statements or opinions heretofore or hereafter made, or information furnished, by any Seller Parties to any Buyerâ s Representatives; and (b) any and all Liabilities with respect to the structural, physical, or environmental condition of the Property, including, without limitation, all Liabilities relating to the release, presence, discovery or removal of any hazardous or regulated substance, chemical, waste or material that may be located in, at, about or under the Property, or connected with or arising out of any and all claims or causes of action based upon CERCLA (Comprehensive Environmental Response, Compensation, and Liability Act of 1980, 42 U.S.C. Notwithstanding the foregoing, the foregoing release and waiver is not intended and shall not be construed as affecting or impairing any rights or remedies that Buyer may have against Seller with respect to (i) a breach of any of Sellerâ s Warranties, (ii) a breach of any Surviving Covenants, or (iii) any acts constituting fraud by Seller.
Model Deployment for Data Scientists Using TensorFlow: Part 1 - Nightfall AI
In the world of machine learning, model deployment is a crucial piece of the puzzle. While data scientists excel at other parts of the pipeline, deploying machine learning models tends to fall under the umbrella of software engineering or IT operations. And for good reason--successful deployments require a myriad of complex tasks, including building infrastructure, implementing APIs, load balancing, and integrating with data pipelines. We'll briefly walk you through a basic model deployment example by picking out tools and planning out an approach to construct a simple sentiment classification model. By the end of this post you will have the tools to serve your deep learning (DL) models via an API.