AITopics | Bella, Gábor

Collaborating Authors

Bella, Gábor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Layers of technology in pluriversal design. Decolonising language technology with the LiveLanguage initiative

Koch, Gertraud, Bella, Gábor, Helm, Paula, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceMay-2-2024

Language technology has the potential to facilitate intercultural communication through meaningful translations. However, the current state of language technology is deeply entangled with colonial knowledge due to path dependencies and neo-colonial tendencies in the global governance of artificial intelligence (AI). Language technology is a complex and emerging field that presents challenges for co-design interventions due to enfolding in assemblages of global scale and diverse sites and its knowledge intensity. This paper uses LiveLanguage, a lexical database, a set of services with particular emphasis on modelling language diversity and integrating small and minority languages, as an example to discuss and close the gap from pluriversal design theory to practice. By diversifying the concept of emerging technology, we can better approach language technology in global contexts. The paper presents a model comprising of five layers of technological activity. Each layer consists of specific practices and stakeholders, thus provides distinctive spaces for co-design interventions as mode of inquiry for de-linking, re-thinking and re-building language technology towards pluriversality. In that way, the paper contributes to reflecting the position of co-design in decolonising emergent technologies, and to integrating complex theoretical knowledge towards decoloniality into language technology design.

artificial intelligence, language technology, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.01783

Country:

North America > United States > New York (0.15)
Europe > Austria > Vienna (0.14)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.34)

Add feedback

Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge

Batsuren, Khuyagbaatar, Vylomova, Ekaterina, Dankers, Verna, Delgerbaatar, Tsetsuukhei, Uzan, Omri, Pinter, Yuval, Bella, Gábor

arXiv.org Artificial IntelligenceApr-20-2024

The popular subword tokenizers of current language models, such as Byte-Pair Encoding (BPE), are known not to respect morpheme boundaries, which affects the downstream performance of the models. While many improved tokenization algorithms have been proposed, their evaluation and cross-comparison is still an open problem. As a solution, we propose a combined intrinsic-extrinsic evaluation framework for subword tokenization. Intrinsic evaluation is based on our new UniMorph Labeller tool that classifies subword tokenization as either morphological or alien. Extrinsic evaluation, in turn, is performed via the Out-of-Vocabulary Generalization Challenge 1.0 benchmark, which consists of three newly specified downstream text classification tasks. Our empirical findings show that the accuracy of UniMorph Labeller is 98%, and that, in all language models studied (including ALBERT, BERT, RoBERTa, and DeBERTa), alien tokenization leads to poorer generalizations compared to morphological tokenization for semantic compositionality of word meanings.

machine learning, natural language, tokenization, (16 more...)

arXiv.org Artificial Intelligence

2404.13292

Country:

North America > United States (0.46)
Europe (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Advancing the Arabic WordNet: Elevating Content Quality

Freihat, Abed Alhakim, Khalilia, Hadi, Bella, Gábor, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceMar-29-2024

High-quality WordNets are crucial for achieving high-quality results in NLP applications that rely on such resources. However, the wordnets of most languages suffer from serious issues of correctness and completeness with respect to the words and word meanings they define, such as incorrect lemmas, missing glosses and example sentences, or an inadequate, Western-centric representation of the morphology and the semantics of the language. Previous efforts have largely focused on increasing lexical coverage while ignoring other qualitative aspects. In this paper, we focus on the Arabic language and introduce a major revision of the Arabic WordNet that addresses multiple dimensions of lexico-semantic resource quality. As a result, we updated more than 58% of the synsets of the existing Arabic WordNet by adding missing information and correcting errors. In order to address issues of language diversity and untranslatability, we also extended the wordnet structure by new elements: phrasets and lexical gaps.

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

2403.20215

Country:

Europe > Middle East > Malta (0.14)
Africa > Middle East > Morocco (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Lexical Diversity in Kinship Across Languages and Dialects

Khalilia, Hadi, Bella, Gábor, Freihat, Abed Alhakim, Darma, Shandy, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceOct-26-2023

Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2308.13056

Country:

Asia > Indonesia (0.47)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Diversity and Language Technology: How Techno-Linguistic Bias Can Cause Epistemic Injustice

Helm, Paula, Bella, Gábor, Koch, Gertraud, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceJul-25-2023

It is well known that AI-based language technology -- large language models, machine translation systems, multilingual dictionaries, and corpora -- is currently limited to 2 to 3 percent of the world's most widely spoken and/or financially and politically best supported languages. In response, recent research efforts have sought to extend the reach of AI technology to ``underserved languages.'' In this paper, we show that many of these attempts produce flawed solutions that adhere to a hard-wired representational preference for certain languages, which we call techno-linguistic bias. Techno-linguistic bias is distinct from the well-established phenomenon of linguistic bias as it does not concern the languages represented but rather the design of the technologies. As we show through the paper, techno-linguistic bias can result in systems that can only express concepts that are part of the language and culture of dominant powers, unable to correctly represent concepts from other communities. We argue that at the root of this problem lies a systematic tendency of technology developer communities to apply a simplistic understanding of diversity which does not do justice to the more profound differences that languages, and ultimately the communities that speak them, embody. Drawing on the concept of epistemic injustice, we point to the broader sociopolitical consequences of the bias we identify and show how it can lead not only to a disregard for valuable aspects of diversity but also to an under-representation of the needs and diverse worldviews of marginalized language communities.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2307.13714

Country:

Europe (1.00)
North America > United States > Massachusetts (0.28)

Genre: Research Report (0.82)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Towards Bridging the Digital Language Divide

Bella, Gábor, Helm, Paula, Koch, Gertraud, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceJul-25-2023

It is a well-known fact that current AI-based language technology -- language models, machine translation systems, multilingual dictionaries and corpora -- focuses on the world's 2-3% most widely spoken languages. Recent research efforts have attempted to expand the coverage of AI technology to `under-resourced languages.' The goal of our paper is to bring attention to a phenomenon that we call linguistic bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. Linguistic bias is manifested in uneven per-language performance even in the case of similar test conditions. We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented, and that can even become ethically problematic as they disregard valuable aspects of diversity as well as the needs of the language communities themselves. As our attempt at building diversity-aware language resources, we present a new initiative that aims at reducing linguistic bias through both technological design and methodology, based on an eye-level collaboration with local communities.

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

2307.13405

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

The Taboo Challenge Competition

Rovatsos, Michael (University of Edinburgh) | Gromann, Dagmar (Artificial Intelligence Research Institute) | Bella, Gábor (University of Edinburgh)

AI MagazineMar-27-2018

Games have always been a popular domain of AI research, and they have been used for many recent competitions. However, reaching human-level performance often either focuses on comprehensive world knowledge or solving decision-making problems with unmanageable solution spaces. Building on the popular Taboo board game, the Taboo Challenge Competition addresses a different problem — that of bridging the gap between the domain knowledge of heterogeneous agents trying to jointly identify a concept without making reference to its most salient features. The competition, which was run for the first time at IJCAI 2017, aims to provide a simple testbed for diversity-aware AI where the focus is on integrating independently engineered AI components, while offering a scenario that is challenging yet simple enough to not require mastering general commonsense knowledge or natural language understanding. We describe the design and preparation of the competition, discuss results, and lessons learned.

artificial intelligence, commonsense reasoning, competition, (20 more...)

AI Magazine

Country: Europe > Netherlands (0.29)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback