Mauritania
"Sirāt" Is a Harrowing, Exhilarating Dance of Death
At one point, Luis assumes that he and Esteban have been abandoned, only to realize, with a start, that their newfound friends are actually circling back to help. In such moments, we grasp the source of the story's mysterious power: a tough-minded understanding that kindness is rare yet persistent, and quite possibly an affront to the laws of nature. "Sirāt" is a chain of defiantly compassionate acts--noble human improbabilities that take on, in retrospect, an air of fatalistic inevitability. Laxe, a restless wanderer himself, knows Morocco well. He shot his first feature, "You All Are Captains" (2011), in Tangier, where he'd spent several years working at a shelter for disadvantaged children. Several of these children appeared in the movie--a formally playful collision of fiction and documentary in which Laxe, also making an appearance, slyly interrogated his European outsider-artist role. Next came "Mimosas" (2016), an elusive, arrestingly gorgeous drama about a caravan bearing a dying sheikh across Morocco's Atlas Mountains to his homeland. The film had the beauty of a travelogue and the opacity of a parable. Its most dynamic character was a fiery Muslim preacher who warned his fellow-travellers not to stray, geographically or morally.
Unlocking the Potential of Global Human Expertise
For example, in the Pandemic Response Challenge experiment, the context consisted of data about the geographic region for which the predictions were made, e.g., historical data of COVID-19 cases and intervention policies; actions were future schedules of intervention policies for the region; and outcomes were predicted future cases of COVID-19 along with the stringency
Konooz: Multi-domain Multi-dialect Corpus for Named Entity Recognition
Hamad, Nagham, Khalilia, Mohammed, Jarrar, Mustafa
We introduce Konooz, a novel multi-dimensional corpus covering 16 Arabic dialects across 10 domains, resulting in 160 distinct corpora. The corpus comprises about 777k tokens, carefully collected and manually annotated with 21 entity types using both nested and flat annotation schemes - using the Wojood guidelines. While Konooz is useful for various NLP tasks like domain adaptation and transfer learning, this paper primarily focuses on benchmarking existing Arabic Named Entity Recognition (NER) models, especially cross-domain and cross-dialect model performance. Our benchmarking of four Arabic NER models using Konooz reveals a significant drop in performance of up to 38% when compared to the in-distribution data. Furthermore, we present an in-depth analysis of domain and dialect divergence and the impact of resource scarcity. We also measured the overlap between domains and dialects using the Maximum Mean Discrepancy (MMD) metric, and illustrated why certain NER models perform better on specific dialects and domains. Konooz is open-source and publicly available at https://sina.birzeit.edu/wojood/#download
Jawaher: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking
Magdy, Samar M., Kwon, Sang Yun, Alwajih, Fakhraddin, Abdelfadil, Safaa, Shehata, Shady, Abdul-Mageed, Muhammad
Recent advancements in instruction fine-tuning, alignment methods such as reinforcement learning from human feedback (RLHF), and optimization techniques like direct preference optimization (DPO) have significantly enhanced the adaptability of large language models (LLMs) to user preferences. However, despite these innovations, many LLMs continue to exhibit biases toward Western, Anglo-centric, or American cultures, with performance on English data consistently surpassing that of other languages. This reveals a persistent cultural gap in LLMs, which complicates their ability to accurately process culturally rich and diverse figurative language such as proverbs. To address this, we introduce Jawaher, a benchmark designed to assess LLMs' capacity to comprehend and interpret Arabic proverbs. Jawaher includes proverbs from various Arabic dialects, along with idiomatic translations and explanations. Through extensive evaluations of both open- and closed-source models, we find that while LLMs can generate idiomatically accurate translations, they struggle with producing culturally nuanced and contextually relevant explanations. These findings highlight the need for ongoing model refinement and dataset expansion to bridge the cultural gap in figurative language processing.
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Alwajih, Fakhraddin, Mekki, Abdellah El, Magdy, Samar Mohamed, Elmadany, Abdelrahim A., Nacar, Omer, Nagoudi, El Moatez Billah, Abdel-Salam, Reem, Atwany, Hanin, Nafea, Youssef, Yahya, Abdulfattah Mohammed, Alhamouri, Rahaf, Alsayadi, Hamzah A., Zayed, Hiba, Shatnawi, Sara, Sibaee, Serry, Ech-Chammakhy, Yasir, Al-Dhabyani, Walid, Ali, Marwa Mohamed, Jarraya, Imen, El-Shangiti, Ahmed Oumar, Alraeesi, Aisha, Al-Ghrawi, Mohammed Anwar, Al-Batati, Abdulrahman S., Mohamed, Elgizouli, Elgindi, Noha Taha, Saeed, Muhammed, Atou, Houdaifa, Yahia, Issam Ait, Bouayad, Abdelhak, Machrouh, Mohammed, Makouar, Amal, Alkawi, Dania, Mohamed, Mukhtar, Abdelfadil, Safaa Taher, Ounnoughene, Amine Ziad, Anfel, Rouabhia, Assi, Rwaa, Sorkatti, Ahmed, Tourad, Mohamedou Cheikh, Koubaa, Anis, Berrada, Ismail, Jarrar, Mustafa, Shehata, Shady, Abdul-Mageed, Muhammad
As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions (input, response pairs) in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world, all of whom are authors of this paper, our dataset offers a broad, inclusive perspective. We use our dataset to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations. For instance, while closed-source LLMs generally exhibit strong performance, they are not without flaws, and smaller open-source models face greater challenges. Moreover, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data for reproducibility are publicly available.
Dynamically Local-Enhancement Planner for Large-Scale Autonomous Driving
Deng, Nanshan, Zhou, Weitao, Zhang, Bo, Wen, Junze, Jiang, Kun, Cao, Zhong, Yang, Diange
IEEE ROBOTICS AND AUTOMA TION LETTERS 1 Dynamically Local-Enhancement Planner for Large-Scale Autonomous Driving Nanshan Deng, Weitao Zhou, Bo Zhang, Junze Wen, Kun Jiang, Zhong Cao, Diange Y ang Abstract --Current autonomous vehicles operate primarily within limited regions, but there is increasing demand for broader applications. However, as models scale, their limited capacity becomes a significant challenge for adapting to novel scenarios. It is increasingly difficult to improve models for new situations using a single monolithic model. T o address this issue, we introduce the concept of dynamically enhancing a basic driving planner with local driving data, without permanently modifying the planner itself. This approach, termed the Dynamically Local-Enhancement (DLE) Planner, aims to improve the scalability of autonomous driving systems without significantly expanding the planner's size. Our approach introduces a position-varying Markov Decision Process formulation coupled with a graph neural network that extracts region-specific driving features from local observation data. The learned features describe the local behavior of the surrounding objects, which is then leveraged to enhance a basic reinforcement learning-based policy. We evaluated our approach in multiple scenarios and compared it with a one-for-all driving model. The results show that our method outperforms the baseline policy in both safety (collision rate) and average reward, while maintaining a lighter scale.
What are Foundation Models Cooking in the Post-Soviet World?
Lavrouk, Anton, Naous, Tarek, Ritter, Alan, Xu, Wei
The culture of the Post-Soviet states is complex, shaped by a turbulent history that continues to influence current events. In this study, we investigate the Post-Soviet cultural food knowledge of foundation models by constructing BORSch, a multimodal dataset encompassing 1147 and 823 dishes in the Russian and Ukrainian languages, centered around the Post-Soviet region. We demonstrate that leading models struggle to correctly identify the origins of dishes from Post-Soviet nations in both text-only and multimodal Question Answering (QA), instead over-predicting countries linked to the language the question is asked in. Through analysis of pretraining data, we show that these results can be explained by misleading dish-origin co-occurrences, along with linguistic phenomena such as Russian-Ukrainian code mixing. Finally, to move beyond QA-based assessments, we test models' abilities to produce accurate visual descriptions of dishes. The weak correlation between this task and QA suggests that QA alone may be insufficient as an evaluation of cultural understanding. To foster further research, we will make BORSch publicly available at https://github.com/alavrouk/BORSch.
A Real-time Spatio-Temporal Trajectory Planner for Autonomous Vehicles with Semantic Graph Optimization
He, Shan, Ma, Yalong, Song, Tao, Jiang, Yongzhi, Wu, Xinkai
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses. Abstract --Planning a safe and feasible trajectory for autonomous vehicles in real -time by fully utilizing perceptual information in complex urban environments is challenging. In this paper, we propose a spatio -temporal trajectory planning method based on graph optimization. It efficiently extracts the multi -modal information of the perception module by constructing a semantic spatio -temporal map through separation processing of static and dynamic obstacles, and then quickly generates feasible trajectories via sparse graph optimization based on a semantic spatiotemporal hypergraph. Extensive experiments have proven that the proposed method can effectively handle complex urban public road scenarios and perform in real time. HE operation of autonomous vehicle s in a complex urban environment presents great challenges .
A Rapid Test for Accuracy and Bias of Face Recognition Technology
Knott, Manuel, Serna, Ignacio, Mann, Ethan, Perona, Pietro
Measuring the accuracy of face recognition (FR) systems is essential for improving performance and ensuring responsible use. Accuracy is typically estimated using large annotated datasets, which are costly and difficult to obtain. We propose a novel method for 1:1 face verification that benchmarks FR systems quickly and without manual annotation, starting from approximate labels (e.g., from web search results). Unlike previous methods for training set label cleaning, ours leverages the embedding representation of the models being evaluated, achieving high accuracy in smaller-sized test datasets. Our approach reliably estimates FR accuracy and ranking, significantly reducing the time and cost of manual labeling. We also introduce the first public benchmark of five FR cloud services, revealing demographic biases, particularly lower accuracy for Asian women. Our rapid test method can democratize FR testing, promoting scrutiny and responsible use of the technology.