AITopics | Ahmadi, Sina

Collaborating Authors

Ahmadi, Sina

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SwiLTra-Bench: The Swiss Legal Translation Benchmark

Niklaus, Joel, Merane, Jakob, Nenadic, Luka, Ahmadi, Sina, Gao, Yingqiang, Chevalley, Cyrill A. H., Humbel, Claude, Gösken, Christophe, Tanzi, Lorenzo, Lüthi, Thomas, Palombo, Stefan, Poff, Spencer, Yang, Boling, Wu, Nan, Guillod, Matthew, Mamié, Robin, Brunner, Daniel, Pereyra, Julio, Grupen, Niko

arXiv.org Artificial IntelligenceMar-3-2025

In Switzerland legal translation is uniquely important due to the country's four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators -- creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.01372

Country:

Europe > Switzerland > Vaud (0.14)
North America > United States > Pennsylvania (0.14)
North America > United States > Michigan (0.14)
(3 more...)

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Language and Speech Technology for Central Kurdish Varieties

Ahmadi, Sina, Jaff, Daban Q., Alam, Md Mahfuz Ibn, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceMar-4-2024

Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in a monolithic way as a macro-language, resulting in disparities for dialects and varieties for which there are few resources and tools available. In this paper, we take a step towards developing resources for language and speech technology for varieties of Central Kurdish, creating a corpus by transcribing movies and TV series as an alternative to fieldwork. Additionally, we report the performance of machine translation, automatic speech recognition, and language identification as downstream tasks evaluated on Central Kurdish varieties. Data and models are publicly available under an open license at https://github.com/sinaahmadi/CORDI.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.01983

Country:

Europe (1.00)
Asia > Middle East > Iran > Kurdistan Province (0.31)
Asia > Middle East > Iraq > Kurdistan Region (0.31)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.48)
Information Technology > Security & Privacy (0.46)
Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

Alam, Md Mahfuz Ibn, Ahmadi, Sina, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceFeb-2-2024

The availability of parallel texts is crucial to the performance of machine translation models. However, most of the world's languages face the predominant challenge of data scarcity. In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data. Our methodology adheres to a realistic scenario backed by the small parallel seed data. It is linguistically informed, as it aims to create augmented data that is more likely to be grammatically correct. We analyze how our synthetic data can be combined with raw parallel data and demonstrate a consistent improvement in performance in our experiments on 14 languages (28 English <-> X pairs) ranging from well- to very low-resource ones. Our method leads to improvements even when using only five seed sentences and a bilingual lexicon.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.01939

Country:

Europe (1.00)
North America > United States > Maryland (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Alam, Md Mahfuz Ibn, Ahmadi, Sina, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceMay-26-2023

Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release \dataset, a contrastive dialectal benchmark encompassing 882 different variations from nine different languages. We also quantitatively demonstrate the challenges large MT models face in effectively translating dialectal variants. We are releasing all code and data.

artificial intelligence, machine translation, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.17267

Country:

Europe > Switzerland (1.00)
Europe > Italy (1.00)
Africa > Middle East (0.67)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities

Ahmadi, Sina, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceMay-25-2023

The wide accessibility of social media has provided linguistically under-represented communities with an extraordinary opportunity to create content in their native languages. This, however, comes with certain challenges in script normalization, particularly where the speakers of a language in a bilingual community rely on another script or orthography to write their native language. This paper addresses the problem of script normalization for several such languages that are mainly written in a Perso-Arabic script. Using synthetic data with various levels of noise and a transformer-based model, we demonstrate that the problem can be effectively remediated. We conduct a small-scale evaluation of real data as well. Our experiments indicate that script normalization is also beneficial to improve the performance of downstream tasks such as machine translation and language identification.

machine learning, natural language, normalization, (19 more...)

arXiv.org Artificial Intelligence

2305.16407

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

Transfer Learning for Low-Resource Sentiment Analysis

Hameed, Razhan, Ahmadi, Sina, Daneshfar, Fatemeh

arXiv.org Artificial IntelligenceApr-10-2023

Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F$_1$ score and accuracy despite the difficulty of the task.

machine learning, natural language, sentiment analysis, (15 more...)

arXiv.org Artificial Intelligence

2304.04703

Country:

Asia > Middle East (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(3 more...)

Add feedback

PALI: A Language Identification Benchmark for Perso-Arabic Scripts

Ahmadi, Sina, Agarwal, Milind, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceApr-3-2023

The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various linguistic communities around the globe. Identifying various languages using such scripts is crucial to language technologies and challenging in low-resource setups. As such, this paper sheds light on the challenges of detecting languages using Perso-Arabic scripts, especially in bilingual communities where ``unconventional'' writing is practiced. To address this, we use a set of supervised techniques to classify sentences into their languages. Building on these, we also propose a hierarchical model that targets clusters of languages that are more often confused by the classifiers. Our experiment results indicate the effectiveness of our solutions.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2304.01322

Country:

Asia (1.00)
Europe > Spain (0.28)
North America > United States > New Mexico (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki

Ahmadi, Sina, Azin, Zahra, Belelli, Sara, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceApr-3-2023

One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data. This is also the case of the Southern varieties of the Kurdish and Laki languages for which very limited resources are available with insubstantial progress in tools. To tackle this, we provide a few approaches that rely on the content of local news websites, a local radio station that broadcasts content in Southern Kurdish and fieldwork for Laki. In this paper, we describe some of the challenges of such under-represented languages, particularly in writing and standardization, and also, in retrieving sources of data and retro-digitizing handwritten content to create a corpus for Southern Kurdish and Laki. In addition, we study the task of language identification in light of the other variants of Kurdish and Zaza-Gorani languages.

artificial intelligence, kurdish, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.01319

Country:

Europe (1.00)
North America (0.93)
Asia > Middle East > Iran (0.30)
Asia > Middle East > Iraq (0.30)

Genre: Research Report (0.50)

Industry:

Media > Radio (0.88)
Leisure & Entertainment (0.88)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Attention-based Encoder-Decoder Networks for Spelling and Grammatical Error Correction

Ahmadi, Sina

arXiv.org Artificial IntelligenceSep-21-2018

Automatic spelling and grammatical correction systems are one of the most widely used tools within natural language applications. In this thesis, we assume the task of error correction as a type of monolingual machine translation where the source sentence is potentially erroneous and the target sentence should be the corrected form of the input. Our main focus in this project is building neural network models for the task of error correction. In particular, we investigate sequence-to-sequence and attention-based models which have recently shown a higher performance than the state-of-the-art of many language processing problems. We demonstrate that neural machine translation models can be successfully applied to the task of error correction. While the experiments of this research are performed on an Arabic corpus, our methods in this thesis can be easily applied to any language.

deep learning, neural network, sequence, (24 more...)

arXiv.org Artificial Intelligence

1810.0066

Country:

North America > United States (0.45)
Asia (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback