Creating emoji lexica from unsupervised sentiment analysis of their descriptions
Fernández-Gavilanes, Milagros, Juncal-Martínez, Jonathan, García-Méndez, Silvia, Costa-Montenegro, Enrique, González-Castaño, Francisco Javier
–arXiv.org Artificial Intelligence
Online media, such as blogs and social networking sites, generate massive volumes of unstructured data of great interest to analyze the opinions and sentiments of individuals and organizations. Novel approaches beyond Natural Language Processing are necessary to quantify these opinions with polarity metrics. So far, the sentiment expressed by emojis has received little attention. The use of symbols, however, has boomed in the past four years. About twenty billion are typed in Twitter nowadays, and new emojis keep appearing in each new Unicode version, making them increasingly relevant to sentiment analysis tasks. This has motivated us to propose a novel approach to predict the sentiments expressed by emojis in online textual messages, such as tweets, that does not require human effort to manually annotate data and saves valuable time for other analysis tasks. For this purpose, we automatically constructed a novel emoji sentiment lexicon using an unsupervised sentiment analysis system based on the definitions given by emoji creators in Emojipedia. Additionally, we automatically created lexicon variants by also considering the sentiment distribution of the informal texts accompanying emojis. All these lexica are evaluated and compared regarding the improvement obtained by including them in sentiment analysis of the annotated datasets provided by Kralj Novak et al. (2015). The results confirm the competitiveness of our approach.
arXiv.org Artificial Intelligence
Apr-1-2024
- Country:
- North America > United States
- Pennsylvania (0.04)
- District of Columbia > Washington (0.04)
- North Carolina > Wake County
- Raleigh (0.04)
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Malden (0.04)
- Colorado > Denver County
- Denver (0.04)
- California
- San Francisco County > San Francisco (0.14)
- San Diego County > San Diego (0.04)
- Europe
- Montenegro (0.04)
- Slovenia (0.04)
- Italy (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Middle East > Malta
- Port Region > Southern Harbour District > Valletta (0.04)
- Spain
- Galicia > Madrid (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- France
- Belgium > Flanders
- Antwerp Province > Antwerp (0.04)
- Asia
- Japan > Kyūshū & Okinawa
- Kyūshū > Kumamoto Prefecture > Kumamoto (0.04)
- China > Anhui Province
- Hefei (0.04)
- Japan > Kyūshū & Okinawa
- North America > United States
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Information Technology > Services (0.46)