Yahia, Issam Ait
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Alwajih, Fakhraddin, Mekki, Abdellah El, Magdy, Samar Mohamed, Elmadany, Abdelrahim A., Nacar, Omer, Nagoudi, El Moatez Billah, Abdel-Salam, Reem, Atwany, Hanin, Nafea, Youssef, Yahya, Abdulfattah Mohammed, Alhamouri, Rahaf, Alsayadi, Hamzah A., Zayed, Hiba, Shatnawi, Sara, Sibaee, Serry, Ech-Chammakhy, Yasir, Al-Dhabyani, Walid, Ali, Marwa Mohamed, Jarraya, Imen, El-Shangiti, Ahmed Oumar, Alraeesi, Aisha, Al-Ghrawi, Mohammed Anwar, Al-Batati, Abdulrahman S., Mohamed, Elgizouli, Elgindi, Noha Taha, Saeed, Muhammed, Atou, Houdaifa, Yahia, Issam Ait, Bouayad, Abdelhak, Machrouh, Mohammed, Makouar, Amal, Alkawi, Dania, Mohamed, Mukhtar, Abdelfadil, Safaa Taher, Ounnoughene, Amine Ziad, Anfel, Rouabhia, Assi, Rwaa, Sorkatti, Ahmed, Tourad, Mohamedou Cheikh, Koubaa, Anis, Berrada, Ismail, Jarrar, Mustafa, Shehata, Shady, Abdul-Mageed, Muhammad
As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions (input, response pairs) in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world, all of whom are authors of this paper, our dataset offers a broad, inclusive perspective. We use our dataset to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations. For instance, while closed-source LLMs generally exhibit strong performance, they are not without flaws, and smaller open-source models face greater challenges. Moreover, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data for reproducibility are publicly available.
KoopAGRU: A Koopman-based Anomaly Detection in Time-Series using Gated Recurrent Units
Yahia, Issam Ait, Berrada, Ismail
Anomaly detection in real-world time-series data is a challenging task due to the complex and nonlinear temporal dynamics involved. This paper introduces KoopAGRU, a new deep learning model designed to tackle this problem by combining Fast Fourier Transform (FFT), Deep Dynamic Mode Decomposition (DeepDMD), and Koopman theory. FFT allows KoopAGRU to decompose temporal data into time-variant and time-invariant components providing precise modeling of complex patterns. To better control these two components, KoopAGRU utilizes Gate Recurrent Unit (GRU) encoders to learn Koopman observables, enhancing the detection capability across multiple temporal scales. KoopAGRU is trained in a single process and offers fast inference times. Extensive tests on various benchmark datasets show that KoopAGRU outperforms other leading methods, achieving a new average F1-score of 90.88\% on the well-known anomalies detection task of times series datasets, and proves to be efficient and reliable in detecting anomalies in real-world scenarios.