Goto

Collaborating Authors

 Koper






Connecting the Persian-speaking World through Transliteration

arXiv.org Artificial Intelligence

Despite speaking mutually intelligible varieties of the same language, speakers of Tajik Persian, written in a modified Cyrillic alphabet, cannot read Iranian and Afghan texts written in the Perso-Arabic script. As the vast majority of Persian text on the Internet is written in Perso-Arabic, monolingual Tajik speakers are unable to interface with the Internet in any meaningful way. This paper presents a transformer-based G2P approach to Tajik-Farsi transliteration, achieving chrF++ scores of 58.70 (Farsi to Tajik) and 74.20 (Tajik to Farsi) on novel digraphic datasets, setting a comparable baseline metric for future work. Our results also demonstrate the non-trivial difficulty of this task in both directions. We also provide an overview of the differences between the two scripts and the challenges they present, so as to aid future efforts in Tajik-Farsi transliteration. Keywords: Persian, Tajik, Transliteration, Orthography, Computational Linguistics 1 Introduction Tajik Persian (henceforth, Tajik) is the formal variety of Modern Persian spoken in Tajikistan. As such, it retains an extremely high level of mutual intelligibility with formal Persian as spoken in Iran and Afghanistan (henceforth referred to as Farsi). Unlike these two countries which use the centuries-old Perso-Arabic script, Tajikistan uses the relatively new Tajik-Cyrillic script due to Tajikistan's Soviet heritage (Perry 2005). While proposals have been made to shift the script back to Perso-Arabic, any significant shift will likely not occur in the near future, with Tajikistan's former Minister of Culture stating in 2008 that "...some 90-95% of Tajikistan's population is not familiar with Arabic script..." 1 (Ghufronov 2008).


Text-to-Image Generation for Vocabulary Learning Using the Keyword Method

arXiv.org Artificial Intelligence

The 'keyword method' is an effective technique for learning vocabulary of a foreign language. It involves creating a memorable visual link between what a word means and what its pronunciation in a foreign language sounds like in the learner's native language. However, these memorable visual links remain implicit in the people's mind and are not easy to remember for a large set of words. To enhance the memorisation and recall of the vocabulary, we developed an application that combines the keyword method with text-to-image generators to externalise the memorable visual links into visuals. These visuals represent additional stimuli during the memorisation process. To explore the effectiveness of this approach we first run a pilot study to investigate how difficult it is to externalise the descriptions of mental visualisations of memorable links, by asking participants to write them down. We used these descriptions as prompts for text-to-image generator (DALL-E2) to convert them into images and asked participants to select their favourites. Next, we compared different text-to-image generators (DALL-E2, Midjourney, Stable and Latent Diffusion) to evaluate the perceived quality of the generated images by each. Despite heterogeneous results, participants mostly preferred images generated by DALL-E2, which was used also for the final study. In this study, we investigated whether providing such images enhances the retention of vocabulary being learned, compared to the keyword method only. Our results indicate that people did not encounter difficulties describing their visualisations of memorable links and that providing corresponding images significantly improves memory retention.


Is Open Source the Future of AI? A Data-Driven Approach

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have become central in academia and industry, raising concerns about privacy, transparency, and misuse. A key issue is the trustworthiness of proprietary models, with open-sourcing often proposed as a solution. However, open-sourcing presents challenges, including potential misuse, financial disincentives, and intellectual property concerns. Proprietary models, backed by private sector resources, are better positioned for return on investment. There are also other approaches that lie somewhere on the spectrum between completely open-source and proprietary. These can largely be categorised into open-source usage limitations protected by licensing, partially open-source (open weights) models, hybrid approaches where obsolete model versions are open-sourced, while competitive versions with market value remain proprietary. Currently, discussions on where on the spectrum future models should fall on remains unbacked and mostly opinionated where industry leaders are weighing in on the discussion. In this paper, we present a data-driven approach by compiling data on open-source development of LLMs, and their contributions in terms of improvements, modifications, and methods. Our goal is to avoid supporting either extreme but rather present data that will support future discussions both by industry experts as well as policy makers. Our findings indicate that open-source contributions can enhance model performance, with trends such as reduced model size and manageable accuracy loss. We also identify positive community engagement patterns and architectures that benefit most from open contributions.


Teaching Shortest Path Algorithms With a Robot and Overlaid Projections

arXiv.org Artificial Intelligence

Robots have the potential to enhance teaching of advanced computer science topics, making abstract concepts more tangible and interactive. In this paper, we present Timmy-a GoPiGo robot augmented with projections to demonstrate shortest path algorithms in an interactive learning environment. We integrated a JavaScript-based application that is projected around the robot, which allows users to construct graphs and visualise three different shortest path algorithms with colour-coded edges and vertices. Animated graph exploration and traversal are augmented by robot movements. To evaluate Timmy, we conducted two user studies. An initial study (= 10) to explore the feasibility of this type of teaching where participants were just observing both robot-synced and the on-screen-only visualisations. And a pilot study (= 6) where participants actively interacted with the system, constructed graphs and selected desired algorithms. In both studies we investigated the preferences towards the system and not the teaching outcome. Initial findings suggest that robots offer an engaging tool for teaching advanced algorithmic concepts, but highlight the need for further methodological refinements and larger-scale studies to fully evaluate their effectiveness.


Real-Time Weapon Detection Using YOLOv8 for Enhanced Safety

arXiv.org Artificial Intelligence

This research paper presents the development of an AI model utilizing YOLOv8 for real-time weapon detection, aimed at enhancing safety in public spaces such as schools, airports, and public transportation systems. As incidents of violence continue to rise globally, there is an urgent need for effective surveillance technologies that can quickly identify potential threats. Our approach focuses on leveraging advanced deep learning techniques to create a highly accurate and efficient system capable of detecting weapons in real-time video streams. The model was trained on a comprehensive dataset containing thousands of images depicting various types of firearms and edged weapons, ensuring a robust learning process. We evaluated the model's performance using key metrics such as precision, recall, F1-score, and mean Average Precision (mAP) across multiple Intersection over Union (IoU) thresholds, revealing a significant capability to differentiate between weapon and non-weapon classes with minimal error. Furthermore, we assessed the system's operational efficiency, demonstrating that it can process frames at high speeds suitable for real-time applications. The findings indicate that our YOLOv8-based weapon detection model not only contributes to the existing body of knowledge in computer vision but also addresses critical societal needs for improved safety measures in vulnerable environments. By harnessing the power of artificial intelligence, this research lays the groundwork for developing practical solutions that can be deployed in security settings, ultimately enhancing the protective capabilities of law enforcement and public safety agencies.


Unveiling the Mystery of Visual Attributes of Concrete and Abstract Concepts: Variability, Nearest Neighbors, and Challenging Categories

arXiv.org Artificial Intelligence

The visual representation of a concept varies significantly depending on its meaning and the context where it occurs; this poses multiple challenges both for vision and multimodal models. Our study focuses on concreteness, a well-researched lexical-semantic variable, using it as a case study to examine the variability in visual representations. We rely on images associated with approximately 1,000 abstract and concrete concepts extracted from two different datasets: Bing and YFCC. Our goals are: (i) evaluate whether visual diversity in the depiction of concepts can reliably distinguish between concrete and abstract concepts; (ii) analyze the variability of visual features across multiple images of the same concept through a nearest neighbor analysis; and (iii) identify challenging factors contributing to this variability by categorizing and annotating images. Our findings indicate that for classifying images of abstract versus concrete concepts, a combination of basic visual features such as color and texture is more effective than features extracted by more complex models like Vision Transformer (ViT). However, ViTs show better performances in the nearest neighbor analysis, emphasizing the need for a careful selection of visual features when analyzing conceptual variables through modalities other than text.