AITopics | Medina Province

Collaborating Authors

Medina Province

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

Ayash, Lama, Alhuzali, Hassan, Alasmari, Ashwag, Aloufi, Sultan

arXiv.org Artificial IntelligenceMar-21-2025

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing; however, they often struggle to accurately capture and reflect cultural nuances. This research addresses this challenge by focusing on Saudi Arabia, a country characterized by diverse dialects and rich cultural traditions. We introduce SaudiCulture, a novel benchmark designed to evaluate the cultural competence of LLMs within the distinct geographical and cultural contexts of Saudi Arabia. SaudiCulture is a comprehensive dataset of questions covering five major geographical regions, such as West, East, South, North, and Center, along with general questions applicable across all regions. The dataset encompasses a broad spectrum of cultural domains, including food, clothing, entertainment, celebrations, and crafts. To ensure a rigorous evaluation, SaudiCulture includes questions of varying complexity, such as open-ended, single-choice, and multiple-choice formats, with some requiring multiple correct answers. Additionally, the dataset distinguishes between common cultural knowledge and specialized regional aspects. We conduct extensive evaluations on five LLMs, such as GPT-4, Llama 3.3, FANAR, Jais, and AceGPT, analyzing their performance across different question types and cultural contexts. Our findings reveal that all models experience significant performance declines when faced with highly specialized or region-specific questions, particularly those requiring multiple correct responses. Additionally, certain cultural categories are more easily identifiable than others, further highlighting inconsistencies in LLMs cultural understanding. These results emphasize the importance of incorporating region-specific knowledge into LLMs training to enhance their cultural competence.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.17485

Country:

Asia > Middle East > Saudi Arabia > Najran Province > Najran (0.04)
Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards the Development of Balanced Synthetic Data for Correcting Grammatical Errors in Arabic: An Approach Based on Error Tagging Model and Synthetic Data Generating Model

Alrehili, Ahlam, Alhothali, Areej

arXiv.org Artificial IntelligenceFeb-7-2025

Synthetic data generation is widely recognized as a way to enhance the quality of neural grammatical error correction (GEC) systems. However, current approaches often lack diversity or are too simplistic to generate the wide range of grammatical errors made by humans, especially for low-resource languages such as Arabic. In this paper, we will develop the error tagging model and the synthetic data generation model to create a large synthetic dataset in Arabic for grammatical error correction. In the error tagging model, the correct sentence is categorized into multiple error types by using the DeBERTav3 model. Arabic Error Type Annotation tool (ARETA) is used to guide multi-label classification tasks in an error tagging model in which each sentence is classified into 26 error tags. The synthetic data generation model is a back-translation-based model that generates incorrect sentences by appending error tags before the correct sentence that was generated from the error tagging model using the ARAT5 model. In the QALB-14 and QALB-15 Test sets, the error tagging model achieved 94.42% F1, which is state-of-the-art in identifying error tags in clean sentences. As a result of our syntactic data training in grammatical error correction, we achieved a new state-of-the-art result of F1-Score: 79.36% in the QALB-14 Test set. We generate 30,219,310 synthetic sentence pairs by using a synthetic data generation model.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.05312

Country:

North America > United States > Arizona (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > Middle East > Yemen (0.04)
(7 more...)

Genre: Research Report > New Finding (0.92)

Industry: Media > News (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Cloned Identity Detection in Social-Sensor Clouds based on Incomplete Profiles

Alharbi, Ahmed, Dong, Hai, Yi, Xun, Abeysekara, Prabath

arXiv.org Artificial IntelligenceNov-2-2024

We propose a novel approach to effectively detect cloned identities of social-sensor cloud service providers (i.e. social media users) in the face of incomplete non-privacy-sensitive profile data. Named ICD-IPD, the proposed approach first extracts account pairs with similar usernames or screen names from a given set of user accounts collected from a social media. It then learns a multi-view representation associated with a given account and extracts two categories of features for every single account. These two categories of features include profile and Weighted Generalised Canonical Correlation Analysis (WGCCA)-based features that may potentially contain missing values. To counter the impact of such missing values, a missing value imputer will next impute the missing values of the aforementioned profile and WGCCA-based features. After that, the proposed approach further extracts two categories of augmented features for each account pair identified previously, namely, 1) similarity and 2) differences-based features. Finally, these features are concatenated and fed into a Light Gradient Boosting Machine classifier to detect identity cloning. We evaluated and compared the proposed approach against the existing state-of-the-art identity cloning approaches and other machine or deep learning models atop a real-world dataset. The experimental results show that the proposed approach outperforms the state-of-the-art approaches and models in terms of Precision, Recall and F1-score.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TSC.2024.3479912

2411.01329

Country:

Asia > Russia (0.14)
Oceania > Australia > Victoria > Melbourne (0.14)
North America > United States (0.14)
(4 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.54)
Overview > Innovation (0.54)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Movement Control of Smart Mosque's Domes using CSRNet and Fuzzy Logic Techniques

Blasi, Anas H., Lababede, Mohammad Awis Al, Alsuwaiket, Mohammed A.

arXiv.org Artificial IntelligenceOct-13-2024

Mosques are worship places of Allah and must be preserved clean, immaculate, provide all the comforts of the worshippers in them. The prophet's mosque in Medina/ Saudi Arabia is one of the most important mosques for Muslims. It occupies second place after the sacred mosque in Mecca/ Saudi Arabia, which is in constant overcrowding by all Muslims to visit the prophet Mohammad's tomb. This paper aims to propose a smart dome model to preserve the fresh air and allow the sunlight to enter the mosque using artificial intelligence techniques. The proposed model controls domes movements based on the weather conditions and the overcrowding rates in the mosque. The data have been collected from two different resources, the first one from the database of Saudi Arabia weather's history, and the other from Shanghai Technology Database. Congested Scene Recognition Network (CSRNet) and Fuzzy techniques have applied using Python programming language to control the domes to be opened and closed for a specific time to renew the air inside the mosque. Also, this model consists of several parts that are connected for controlling the mechanism of opening/closing domes according to weather data and the situation of crowding in the mosque. Finally, the main goal of this paper has been achieved, and the proposed model has worked efficiently and specifies the exact duration time to keep the domes open automatically for a few minutes for each hour head.

machine learning, mosque, programming language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.14569/issn.2156-5570

2410.18123

Country:

Asia > China > Shanghai > Shanghai (0.25)
Asia > Middle East > Saudi Arabia > Medina Province > Medina (0.24)
Asia > Middle East > Saudi Arabia > Mecca Province > Mecca (0.24)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.69)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Software > Programming Languages (0.87)

Add feedback

An Effective Weight Initialization Method for Deep Learning: Application to Satellite Image Classification

Boulila, Wadii, Alshanqiti, Eman, Alzahem, Ayyub, Koubaa, Anis, Mlaiki, Nabil

arXiv.org Artificial IntelligenceJun-1-2024

The growing interest in satellite imagery has triggered the need for efficient mechanisms to extract valuable information from these vast data sources, providing deeper insights. Even though deep learning has shown significant progress in satellite image classification. Nevertheless, in the literature, only a few results can be found on weight initialization techniques. These techniques traditionally involve initializing the networks' weights before training on extensive datasets, distinct from fine-tuning the weights of pre-trained networks. In this study, a novel weight initialization method is proposed in the context of satellite image classification. The proposed weight initialization method is mathematically detailed during the forward and backward passes of the convolutional neural network (CNN) model. Extensive experiments are carried out using six real-world datasets. Comparative analyses with existing weight initialization techniques made on various well-known CNN models reveal that the proposed weight initialization technique outperforms the previous competitive techniques in classification accuracy. The complete code of the proposed technique, along with the obtained results, is available at https://github.com/WadiiBoulila/Weight-Initialization

initialization, initialization method, weight initialization method, (15 more...)

arXiv.org Artificial Intelligence

2406.00348

Country:

Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)
Africa > Middle East > Tunisia > Manouba Governorate > Manouba (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Education > Educational Setting > Online (0.81)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Responsible Design Patterns for Machine Learning Pipelines

Harbi, Saud Hakem Al, Tidjon, Lionel Nganyewou, Khomh, Foutse

arXiv.org Artificial IntelligenceJun-7-2023

Integrating ethical practices into the AI development process for artificial intelligence (AI) is essential to ensure safe, fair, and responsible operation. AI ethics involves applying ethical principles to the entire life cycle of AI systems. This is essential to mitigate potential risks and harms associated with AI, such as algorithm biases. To achieve this goal, responsible design patterns (RDPs) are critical for Machine Learning (ML) pipelines to guarantee ethical and fair outcomes. In this paper, we propose a comprehensive framework incorporating RDPs into ML pipelines to mitigate risks and ensure the ethical development of AI systems. Our framework comprises new responsible AI design patterns for ML pipelines identified through a survey of AI ethics and data management experts and validated through real-world scenarios with expert feedback. The framework guides AI developers, data scientists, and policy-makers to implement ethical practices in AI development and deploy responsible AI systems in production.

ai system, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.01788

Country:

Europe (0.46)
Oceania > Australia (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

Fractional norms and quasinorms do not help to overcome the curse of dimensionality

Mirkes, Evgeny M., Allohibi, Jeza, Gorban, Alexander N.

arXiv.org Machine LearningApr-29-2020

The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a greater relative contrast or coefficient of variation than the Euclidean norm l2, but we also demonstrate that the distance concentration shows qualitatively the same behaviour for all tested norms and quasinorms and the difference between them decays as dimension tends to infinity. Estimation of classification quality for kNN based on different norms and quasinorms shows that a greater relative contrast does not mean better classifier performance and the worst performance for different databases was shown by different norms (quasinorms). A systematic comparison shows that the difference of the performance of kNN based on lp for p=2, 1, and 0.5 is statistically insignificant.

archive, database, dimensionality, (16 more...)

arXiv.org Machine Learning

doi: 10.3390/e22101105

2004.1423

Country:

Europe > Russia (0.14)
Asia > Russia (0.14)
Europe > United Kingdom > England > Leicestershire > Leicester (0.05)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.90)

Add feedback