Information Extraction
Only Facebook knows the extent of its misinformation problem. And it's not sharing, even with the White House.
But the debates between Facebook and the White House throughout the spring and into summer also gave rise to a broader and still unresolved disagreement over what constitutes misinformation, according to the person familiar with Facebook's thinking. Facebook strongly believes people should have the right to broadly express themselves without censorship on social platforms, and had reviewed research that shows that friends and family can often be more effective at countering misinformation than official sources that people distrust. The Facebook executives thought the Biden camp was going too far, by identifying specific pieces of content as problematic and asking it to potentially suppress valuable conversations where people express fears and skepticism.
An Effective System for Multi-format Information Extraction
Liu, Yaduo, Zhang, Longhui, Yin, Shujuan, Zhao, Xiaofeng, Ren, Feiliang
The multi-format information extraction task in the 2021 Language and Intelligence Challenge is designed to comprehensively evaluate information extraction from different dimensions. It consists of an multiple slots relation extraction subtask and two event extraction subtasks that extract events from both sentence-level and document-level. Here we describe our system for this multi-format information extraction competition task. Specifically, for the relation extraction subtask, we convert it to a traditional triple extraction task and design a voting based method that makes full use of existing models. For the sentence-level event extraction subtask, we convert it to a NER task and use a pointer labeling based method for extraction. Furthermore, considering the annotated trigger information may be helpful for event extraction, we design an auxiliary trigger recognition model and use the multi-task learning mechanism to integrate the trigger features into the event extraction model. For the document-level event extraction subtask, we design an Encoder-Decoder based method and propose a Transformer-alike decoder. Finally,our system ranks No.4 on the test set leader-board of this multi-format information extraction task, and its F1 scores for the subtasks of relation extraction, event extractions of sentence-level and document-level are 79.887%, 85.179%, and 70.828% respectively. The codes of our model are available at {https://github.com/neukg/MultiIE}.
I analyzed hundreds of user's Tinder data -- including messages -- so you didn't have to.
I read Modern Romance by Aziz Ansari in 2016 and beyond a shadow of a doubt, it is one of the most influential books I've ever read. At the time, I was a snot-nosed college student who was still dating someone from high school. The numbers and figures given by the book about online dating success struck me as being callous. Millennials and their predecessors were blessed and cursed with the advent of the internet. The proliferation of partner-choice desensitizes us and gives us unrealistic expectations when it came to searching for our "soulmate." Instead of feeling dissuaded, I felt inspired.
Aspect Sentiment Triplet Extraction Using Reinforcement Learning
Jian, Samson Yu Bai, Nayak, Tapas, Majumder, Navonil, Poria, Soujanya
Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting triplets of aspect terms, their associated sentiments, and the opinion terms that provide evidence for the expressed sentiments. Previous approaches to ASTE usually simultaneously extract all three components or first identify the aspect and opinion terms, then pair them up to predict their sentiment polarities. In this work, we present a novel paradigm, ASTE-RL, by regarding the aspect and opinion terms as arguments of the expressed sentiment in a hierarchical reinforcement learning (RL) framework. We first focus on sentiments expressed in a sentence, then identify the target aspect and opinion terms for that sentiment. This takes into account the mutual interactions among the triplet's components while improving exploration and sample efficiency. Furthermore, this hierarchical RLsetup enables us to deal with multiple and overlapping triplets. In our experiments, we evaluate our model on existing datasets from laptop and restaurant domains and show that it achieves state-of-the-art performance. The implementation of this work is publicly available at https://github.com/declare-lab/ASTE-RL.
Zero-shot Task Transfer for Invoice Extraction via Class-aware QA Ensemble
Damodaran, Prithiviraj, Singh, Prabhkaran, Achankuju, Josemon
We present VESPA, an intentionally simple yet novel zero-shot system for layout, locale, and domain agnostic document extraction. In spite of the availability of large corpora of documents, the lack of labeled and validated datasets makes it a challenge to discriminatively train document extraction models for enterprises. We show that this problem can be addressed by simply transferring the information extraction (IE) task to a natural language Question-Answering (QA) task without engineering task-specific architectures. We demonstrate the effectiveness of our system by evaluating on a closed corpus of real-world retail and tax invoices with multiple complex layouts, domains, and geographies. The empirical evaluation shows that our system outperforms 4 prominent commercial invoice solutions that use discriminatively trained models with architectures specifically crafted for invoice extraction. We extracted 6 fields with zero upfront human annotation or training with an Avg. F1 of 87.50.
Facebook adds Photobucket and Google Calendar to its data portability options
Facebook has today announced that it has added two new destinations for when you want to move your data from the social network. In a blog post, the company said that users will be able to move their images to Photobucket and event listings to Google Calendar. Product Manager Hadi Michel said that the tool has been "completely rebuilt" to be "simpler and more intuitive," giving people more clarity on what they can share to which platforms. In addition, users can now launch multiple transfers, with better fine-grain control on what they're choosing to export in any one transfer. This is yet another feature piled on to the Data Transfer Project, an open-source project developed by Google, Facebook and Microsoft.
Using Twitter to Understand Pizza Delivery Apprehension During COVID - KDnuggets
India witnessed its first-ever nationwide lockdown from 24th March 2020 to 31st May 2020 to fight the spread of the novel coronavirus by limiting the movement of its residents. The study of this article aims at identifying the different emotions of customers in ordering pizza in India from one of the most popular pizza delivery chains called Dominos. From 128 stores in 2006 Domino's India, the poster brand of Jubilant Foodworks reported more than 1300 stores in 2020 across the country. The study analyzed Twitter data during three different periods. The first period from 1st January 2020 to 24th March 2020 was considered as a pre-lockdown, the second period from 25th March 2020 to 31st May 2020 was considered as the lockdown period and finally, the third period from 1st January 2021 to 28th February 2021 was considered as post-lockdown. Twitter data for these three different periods were extracted using Sprinklr (licensed by IIM Ahmedabad), which is specialized in providing real-time user conversations from modern social handles.
Recommending Insurance products by using Users' Sentiments
Parasrampuria, Rohan, Ghosh, Ayan, Dutta, Suchandra, Sarkar, Dhrubasish
In today's tech-savvy world every industry is trying to formulate methods for recommending products by combining several techniques and algorithms to form a pool that would bring forward the most enhanced models for making the predictions. Building on these lines is our paper focused on the application of sentiment analysis for recommendation in the insurance domain. We tried building the following Machine Learning models namely, Logistic Regression, Multinomial Naive Bayes, and the mighty Random Forest for analyzing the polarity of a given feedback line given by a customer. Then we used this polarity along with other attributes like Age, Gender, Locality, Income, and the list of other products already purchased by our existing customers as input for our recommendation model. Then we matched the polarity score along with the user's profiles and generated the list of insurance products to be recommended in descending order. Despite our model's simplicity and the lack of the key data sets, the results seemed very logical and realistic. So, by developing the model with more enhanced methods and with access to better and true data gathered from an insurance industry may be the sector could be very well benefitted from the amalgamation of sentiment analysis with a recommendation.
Announcing GA of Text Analytics for health, Opinion Mining, PII and Analyze
It has been a year since we released (in GA) our last TA API (v3.0). After five previews of adding features, responsible AI, incorporating customer feedback, UX feedback, and optimizations; in July 2021 we announced GA (General Availability) of Text Analytics v3.1. With this release, starting July 2021 customers can use Text Analytics for health, Opinion Mining, PII and Analyze as GA offerings. Text Analytics for health is a feature of the Text Analytics API service that extracts and labels relevant medical information from unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records. Millions of Text Records were processed during the preview in the last year.
Transformer-Encoder-GRU (T-E-GRU) for Chinese Sentiment Analysis on Chinese Comment Text
Chinese sentiment analysis (CSA) has always been one of the challenges in natural language processing due to its complexity and uncertainty. Transformer has succeeded in capturing semantic features, but it uses position encoding to capture sequence features, which has great shortcomings compared with the recurrent model. In this paper, we propose T-E-GRU for Chinese sentiment analysis, which combine transformer encoder and GRU. We conducted experiments on three Chinese comment datasets. In view of the confusion of punctuation marks in Chinese comment texts, we selectively retain some punctuation marks with sentence segmentation ability. The experimental results show that T-E-GRU outperforms classic recurrent model and recurrent model with attention.