Collaborating Authors

Information Extraction

WhatsApp Has Shared Your Data With Facebook for Years


Since Facebook acquired WhatsApp in 2014, users have wondered and worried about how much data would flow between the two platforms. Many of them experienced a rude awakening this week, as a new in-app notification raises awareness about a step WhatsApp actually took to share more with Facebook back in 2016. On Monday, WhatsApp updated its terms of use and privacy policy, primarily to expand on its practices around how WhatsApp business users can store their communications. A pop-up has been notifying users that as of February 8, the app's privacy policy will change and they must accept the terms to keep using the app. As part of that privacy policy refresh, WhatsApp also removed a passage about opting out of sharing certain data with Facebook: "If you are an existing user, you can choose not to have your WhatsApp account information shared with Facebook to improve your Facebook ads and products experiences."

WhatsApp: Let us share your data with Facebook or else


In a surprise move, WhatsApp recently gave many of its users a difficult choice: they could either accept a revised privacy policy that explicit allowed the service to share information with parent company Facebook by February 8th, or decline and risk not being able to use the service at all. The company informed those users through an in-app notification which lays out the changes in very broad terms: the updates to the policy include "more information about WhatsApp's service and how we process your data, how businesses can use Facebook hosted services to store and manage their WhatsApp chats, [and] how we partner with Facebook to offer integrations across the Facebook Company Products." Upon further inspection, the updated policy makes clear that data collected by WhatsApp -- including user phone numbers, "transaction data, service-related information, information on how you interact with others (including businesses) when using our Services, mobile device information, your IP address" and more are subject to be shared with other properties owned and controlled by Facebook. "As part of the Facebook Companies, WhatsApp receives information from, and shares information (see here) with, the other Facebook Companies," the updated privacy policy reads. "We may use the information we receive from them, and they may use the information we share with them, to help operate, provide, improve, understand, customize, support, and market our Services and their offerings, including the Facebook Company Products."

Building domain specific lexicon based on TikTok comment dataset Artificial Intelligence

In the sentiment analysis task, predicting the sentiment tendency of a sentence is an important branch. Previous research focused more on sentiment analysis in English, for example, analyzing the sentiment tendency of sentences based on Valence, Arousal, Dominance of sentences. the emotional tendency is different between the two languages. For example, the sentence order between Chinese and English may present different emotions. This paper tried a method that builds a domain-specific lexicon. In this way, the model can classify Chinese words with emotional tendency. In this approach, based on the [13], an ultra-dense space embedding table is trained through word embedding of Chinese TikTok review and emotional lexicon sources(seed words). The result of the model is a domain-specific lexicon, which presents the emotional tendency of words. I collected Chinese TikTok comments as training data. By comparing The training results with the PCA method to evaluate the performance of the model in Chinese sentiment classification, the results show that the model has done well in Chinese. The source code has released on github:

An AI Used Facebook Data to Predict Mental Illness


It's easy to do bad things with Facebook data. From targeting ads for bizarrely specific T-shirts to manipulating an electorate, the questionable purposes to which the social media behemoth can be put are numerous. But there are also some people out there trying to use Facebook for good--or, at least, to improve the diagnosis of mental illness. On December 3, a group of researchers reported that they had managed to predict psychiatric diagnoses with Facebook data--using messages sent up to 18 months before a user received an official diagnosis. The team worked with 223 volunteers, who all gave the researchers access to their personal Facebook messages.

Discovering Airline-Specific Business Intelligence from Online Passenger Reviews: An Unsupervised Text Analytics Approach Artificial Intelligence

To understand the important dimensions of service quality from the passenger's perspective and tailor service offerings for competitive advantage, airlines can capitalize on the abundantly available online customer reviews (OCR). The objective of this paper is to discover company- and competitor-specific intelligence from OCR using an unsupervised text analytics approach. First, the key aspects (or topics) discussed in the OCR are extracted using three topic models - probabilistic latent semantic analysis (pLSA) and two variants of Latent Dirichlet allocation (LDA-VI and LDA-GS). Subsequently, we propose an ensemble-assisted topic model (EA-TM), which integrates the individual topic models, to classify each review sentence to the most representative aspect. Likewise, to determine the sentiment corresponding to a review sentence, an ensemble sentiment analyzer (E-SA), which combines the predictions of three opinion mining methods (AFINN, SentiStrength, and VADER), is developed. An aspect-based opinion summary (AOS), which provides a snapshot of passenger-perceived strengths and weaknesses of an airline, is established by consolidating the sentiments associated with each aspect. Furthermore, a bi-gram analysis of the labeled OCR is employed to perform root cause analysis within each identified aspect. A case study involving 99,147 airline reviews of a US-based target carrier and four of its competitors is used to validate the proposed approach. The results indicate that a cost- and time-effective performance summary of an airline and its competitors can be obtained from OCR. Finally, besides providing theoretical and managerial implications based on our results, we also provide implications for post-pandemic preparedness in the airline industry considering the unprecedented impact of coronavirus disease 2019 (COVID-19) and predictions on similar pandemics in the future.

"Thought I'd Share First": An Analysis of COVID-19 Conspiracy Theories and Misinformation Spread on Twitter Machine Learning

Background: Misinformation spread through social media is a growing problem, and the emergence of COVID-19 has caused an explosion in new activity and renewed focus on the resulting threat to public health. Given this increased visibility, in-depth analysis of COVID-19 misinformation spread is critical to understanding the evolution of ideas with potential negative public health impact. Methods: Using a curated data set of COVID-19 tweets (N ~120 million tweets) spanning late January to early May 2020, we applied methods including regular expression filtering, supervised machine learning, sentiment analysis, geospatial analysis, and dynamic topic modeling to trace the spread of misinformation and to characterize novel features of COVID-19 conspiracy theories. Results: Random forest models for four major misinformation topics provided mixed results, with narrowly-defined conspiracy theories achieving F1 scores of 0.804 and 0.857, while more broad theories performed measurably worse, with scores of 0.654 and 0.347. Despite this, analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. We were able to identify distinct increases in negative sentiment, theory-specific trends in geospatial spread, and the evolution of conspiracy theory topics and subtopics over time. Conclusions: COVID-19 related conspiracy theories show that history frequently repeats itself, with the same conspiracy theories being recycled for new situations. We use a combination of supervised learning, unsupervised learning, and natural language processing techniques to look at the evolution of theories over the first four months of the COVID-19 outbreak, how these theories intertwine, and to hypothesize on more effective public health messaging to combat misinformation in online spaces.

Extract the text from long videos with Python


Speech recognition is an interesting task that allows you to improve the quality of your life. In this neverending Covid period, I need to watch many videos of lessons, and it's so easy to lose concentration. At the same time, the possibility to have all registrations available on my university's website made me become a perfectionist, so I would like to take every word in my notes. But it's costly because it needs a lot of work and steals time. Luckily, there are already API resources available such as Google, Amazon, IBM, and many others, that offer services that convert audio into text.

Sentiment Analysis (Opinion Mining) with Python -- NLP Tutorial


A "sentiment" is a generally binary opposition in opinions and expresses the feelings in the form of emotions, attitudes, opinions, and so on. It can express many opinions. By using machine learning methods and natural language processing, we can extract the personal information of a document and attempt to classify it according to its polarity, such as positive, neutral, or negative, making sentiment analysis instrumental in determining the overall opinion of a defined objective, for instance, a selling item or predicting stock markets for a given company. Sentiment analysis is challenging and far from being solved since most languages are highly complex (objectivity, subjectivity, negation, vocabulary, grammar, and others). However, that is what makes it exciting to working on [1].

Aspect Based Sentiment Analysis


We live in a world which is more opinionated than ever. Any service that we consume leaves us either satisfied or unsatisfied. And with the advent of social media, we make our views public in no time. Vast sources of data are available in the form of reviews, customer satisfaction surveys, customer complaints, etc. Businesses can use this data to understand what customers are talking about, and make data driven decisions to improve their services. Let's talk in terms of Machine Learning now! Sentiment Analysis is the process of understanding how satisfied customers are w.r.t. a service.

A Sentiment Analysis Approach to the Prediction of Market Volatility Artificial Intelligence

Prediction and quantification of future volatility and returns play an important role in financial modelling, both in portfolio optimization and risk management. Natural language processing today allows to process news and social media comments to detect signals of investors' confidence. We have explored the relationship between sentiment extracted from financial news and tweets and FTSE100 movements. We investigated the strength of the correlation between sentiment measures on a given day and market volatility and returns observed the next day. The findings suggest that there is evidence of correlation between sentiment and stock market movements: the sentiment captured from news headlines could be used as a signal to predict market returns; the same does not apply for volatility. Also, in a surprising finding, for the sentiment found in Twitter comments we obtained a correlation coefficient of -0.7, and p-value below 0.05, which indicates a strong negative correlation between positive sentiment captured from the tweets on a given day and the volatility observed the next day. We developed an accurate classifier for the prediction of market volatility in response to the arrival of new information by deploying topic modelling, based on Latent Dirichlet Allocation, to extract feature vectors from a collection of tweets and financial news. The obtained features were used as additional input to the classifier. Thanks to the combination of sentiment and topic modelling our classifier achieved a directional prediction accuracy for volatility of 63%.