In the sentiment analysis task, predicting the sentiment tendency of a sentence is an important branch. Previous research focused more on sentiment analysis in English, for example, analyzing the sentiment tendency of sentences based on Valence, Arousal, Dominance of sentences. the emotional tendency is different between the two languages. For example, the sentence order between Chinese and English may present different emotions. This paper tried a method that builds a domain-specific lexicon. In this way, the model can classify Chinese words with emotional tendency. In this approach, based on the , an ultra-dense space embedding table is trained through word embedding of Chinese TikTok review and emotional lexicon sources(seed words). The result of the model is a domain-specific lexicon, which presents the emotional tendency of words. I collected Chinese TikTok comments as training data. By comparing The training results with the PCA method to evaluate the performance of the model in Chinese sentiment classification, the results show that the model has done well in Chinese. The source code has released on github:https://github.com/h2222/douyin_comment_dataset
It's easy to do bad things with Facebook data. From targeting ads for bizarrely specific T-shirts to manipulating an electorate, the questionable purposes to which the social media behemoth can be put are numerous. But there are also some people out there trying to use Facebook for good--or, at least, to improve the diagnosis of mental illness. On December 3, a group of researchers reported that they had managed to predict psychiatric diagnoses with Facebook data--using messages sent up to 18 months before a user received an official diagnosis. The team worked with 223 volunteers, who all gave the researchers access to their personal Facebook messages.
To understand the important dimensions of service quality from the passenger's perspective and tailor service offerings for competitive advantage, airlines can capitalize on the abundantly available online customer reviews (OCR). The objective of this paper is to discover company- and competitor-specific intelligence from OCR using an unsupervised text analytics approach. First, the key aspects (or topics) discussed in the OCR are extracted using three topic models - probabilistic latent semantic analysis (pLSA) and two variants of Latent Dirichlet allocation (LDA-VI and LDA-GS). Subsequently, we propose an ensemble-assisted topic model (EA-TM), which integrates the individual topic models, to classify each review sentence to the most representative aspect. Likewise, to determine the sentiment corresponding to a review sentence, an ensemble sentiment analyzer (E-SA), which combines the predictions of three opinion mining methods (AFINN, SentiStrength, and VADER), is developed. An aspect-based opinion summary (AOS), which provides a snapshot of passenger-perceived strengths and weaknesses of an airline, is established by consolidating the sentiments associated with each aspect. Furthermore, a bi-gram analysis of the labeled OCR is employed to perform root cause analysis within each identified aspect. A case study involving 99,147 airline reviews of a US-based target carrier and four of its competitors is used to validate the proposed approach. The results indicate that a cost- and time-effective performance summary of an airline and its competitors can be obtained from OCR. Finally, besides providing theoretical and managerial implications based on our results, we also provide implications for post-pandemic preparedness in the airline industry considering the unprecedented impact of coronavirus disease 2019 (COVID-19) and predictions on similar pandemics in the future.
Background: Misinformation spread through social media is a growing problem, and the emergence of COVID-19 has caused an explosion in new activity and renewed focus on the resulting threat to public health. Given this increased visibility, in-depth analysis of COVID-19 misinformation spread is critical to understanding the evolution of ideas with potential negative public health impact. Methods: Using a curated data set of COVID-19 tweets (N ~120 million tweets) spanning late January to early May 2020, we applied methods including regular expression filtering, supervised machine learning, sentiment analysis, geospatial analysis, and dynamic topic modeling to trace the spread of misinformation and to characterize novel features of COVID-19 conspiracy theories. Results: Random forest models for four major misinformation topics provided mixed results, with narrowly-defined conspiracy theories achieving F1 scores of 0.804 and 0.857, while more broad theories performed measurably worse, with scores of 0.654 and 0.347. Despite this, analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. We were able to identify distinct increases in negative sentiment, theory-specific trends in geospatial spread, and the evolution of conspiracy theory topics and subtopics over time. Conclusions: COVID-19 related conspiracy theories show that history frequently repeats itself, with the same conspiracy theories being recycled for new situations. We use a combination of supervised learning, unsupervised learning, and natural language processing techniques to look at the evolution of theories over the first four months of the COVID-19 outbreak, how these theories intertwine, and to hypothesize on more effective public health messaging to combat misinformation in online spaces.
Speech recognition is an interesting task that allows you to improve the quality of your life. In this neverending Covid period, I need to watch many videos of lessons, and it's so easy to lose concentration. At the same time, the possibility to have all registrations available on my university's website made me become a perfectionist, so I would like to take every word in my notes. But it's costly because it needs a lot of work and steals time. Luckily, there are already API resources available such as Google, Amazon, IBM, and many others, that offer services that convert audio into text.
A "sentiment" is a generally binary opposition in opinions and expresses the feelings in the form of emotions, attitudes, opinions, and so on. It can express many opinions. By using machine learning methods and natural language processing, we can extract the personal information of a document and attempt to classify it according to its polarity, such as positive, neutral, or negative, making sentiment analysis instrumental in determining the overall opinion of a defined objective, for instance, a selling item or predicting stock markets for a given company. Sentiment analysis is challenging and far from being solved since most languages are highly complex (objectivity, subjectivity, negation, vocabulary, grammar, and others). However, that is what makes it exciting to working on .
We live in a world which is more opinionated than ever. Any service that we consume leaves us either satisfied or unsatisfied. And with the advent of social media, we make our views public in no time. Vast sources of data are available in the form of reviews, customer satisfaction surveys, customer complaints, etc. Businesses can use this data to understand what customers are talking about, and make data driven decisions to improve their services. Let's talk in terms of Machine Learning now! Sentiment Analysis is the process of understanding how satisfied customers are w.r.t. a service.
Prediction and quantification of future volatility and returns play an important role in financial modelling, both in portfolio optimization and risk management. Natural language processing today allows to process news and social media comments to detect signals of investors' confidence. We have explored the relationship between sentiment extracted from financial news and tweets and FTSE100 movements. We investigated the strength of the correlation between sentiment measures on a given day and market volatility and returns observed the next day. The findings suggest that there is evidence of correlation between sentiment and stock market movements: the sentiment captured from news headlines could be used as a signal to predict market returns; the same does not apply for volatility. Also, in a surprising finding, for the sentiment found in Twitter comments we obtained a correlation coefficient of -0.7, and p-value below 0.05, which indicates a strong negative correlation between positive sentiment captured from the tweets on a given day and the volatility observed the next day. We developed an accurate classifier for the prediction of market volatility in response to the arrival of new information by deploying topic modelling, based on Latent Dirichlet Allocation, to extract feature vectors from a collection of tweets and financial news. The obtained features were used as additional input to the classifier. Thanks to the combination of sentiment and topic modelling our classifier achieved a directional prediction accuracy for volatility of 63%.