FASSILA: A Corpus for Algerian Dialect Fake News Detection and Sentiment Analysis

Abdedaiem, Amin, Dahou, Abdelhalim Hafedh, Cheragui, Mohamed Amine, Mathiak, Brigitte

Nov-7-2024–arXiv.org Artificial Intelligence

Building a corpus become an important topic in natural language processing (NLP) and especially for low resource languages (ex: AD), due to the importance that the corpus plays in the development of several tools, such as: Machine Translation Babaali and Salem [2022], Part of speech tagging Chiche and Yitagesu [2022], Named entities recognition Jarrar et al. [2022], etc. in particular with the emergence of techniques based on statistics, machine learning and deep learning. Who exploits this mass of information to develop, train and evaluate models. However, building a corpus is not an easy task Bakari et al. [2016]; it is extremely time-consuming and requires a lot of work, for the good reason that the volume and quality of the corpus are two important parameters. Despite the recent emergence of techniques that consume fewer resources, such as few-shot learning Tunstall et al. [2022]. Over the last few years, a lot of studies in NLP have focused on languages or variants of languages called low resources Mengoni and Santucci [2023]. This change of direction is mainly due to the emergence of social media such as Facebook, Twitter, RenRen, LinkedIn, Google+, and Tuenti, as a means of communication where people exchange messages and comments.

algerian dialect, corpus, dialect, (14 more...)

arXiv.org Artificial Intelligence

Nov-7-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe
  - Germany (0.04)
  - Italy > Tuscany
    - Florence (0.04)
- Asia
  - Middle East > Qatar (0.04)
  - China > Shaanxi Province
    - Xi'an (0.04)
- Africa > Middle East
  - Algeria > Adrar Province > Adrar (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Media > News (0.86)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language
      - Machine Translation (1.00)
      - Large Language Model (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (1.00)