Data Collection and Analysis of French Dialects
Choudhry, Omar Shaur, Odida, Paul Omara, Reiner, Joshua, Appleyard, Keiron, Kushnir, Danielle, Toon, William
–arXiv.org Artificial Intelligence
This paper discusses creating and analysing a new dataset for data mining and text analytics research, contributing to a joint Leeds University research project for the Corpus of National Dialects. This report investigates machine learning classifiers to classify samples of French dialect text across various French-speaking countries. Following the steps of the CRISP-DM methodology, this report explores the data collection process, data quality issues and data conversion for text analysis. Finally, after applying suitable data mining techniques, the evaluation methods, best overall features and classifiers and conclusions are discussed.
arXiv.org Artificial Intelligence
Aug-1-2022
- Country:
- Oceania > New Zealand
- North Island > Waikato (0.05)
- Europe
- United Kingdom (0.14)
- France (0.04)
- Africa
- Senegal (0.05)
- Côte d'Ivoire (0.04)
- Middle East
- Algeria (0.04)
- Morocco > Casablanca-Settat Region
- Casablanca (0.04)
- Democratic Republic of the Congo > Kinshasa Province
- Kinshasa (0.04)
- Oceania > New Zealand
- Genre:
- Research Report (1.00)
- Technology: