AITopics | lexical gap

Collaborating Authors

lexical gap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Crowdsourcing Lexical Diversity

Khalilia, Hadi, Otterbacher, Jahna, Bella, Gabor, Noortyani, Rusma, Darma, Shandy, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceOct-30-2024

Lexical-semantic resources (LSRs), such as online lexicons or wordnets, are fundamental for natural language processing applications. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also, the rarely addressed issue of bias towards the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual \emph{lexical gaps}, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing tool facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.

asian low-resour, experiment, lexical gap, (14 more...)

arXiv.org Artificial Intelligence

2410.23133

Country:

Europe > United Kingdom > UK North Sea (0.05)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.05)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
(30 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

Locally Measuring Cross-lingual Lexical Alignment: A Domain and Word Level Perspective

Karidi, Taelin, Grossman, Eitan, Abend, Omri

arXiv.org Artificial IntelligenceOct-7-2024

NLP research on aligning lexical representation spaces to one another has so far focused on aligning language spaces in their entirety. However, cognitive science has long focused on a local perspective, investigating whether translation equivalents truly share the same meaning or the extent that cultural and regional influences result in meaning variations. With recent technological advances and the increasing amounts of available data, the longstanding question of cross-lingual lexical alignment can now be approached in a more data-driven manner. However, developing metrics for the task requires some methodology for comparing metric efficacy. We address this gap and present a methodology for analyzing both synthetic validations and a novel naturalistic validation using lexical gaps in the kinship domain. We further propose new metrics, hitherto unexplored on this task, based on contextualized embeddings. Our analysis spans 16 diverse languages, demonstrating that there is substantial room for improvement with the use of newer language models. Our research paves the way for more accurate and nuanced cross-lingual lexical alignment methodologies and evaluation.

alignment, correlation, metric, (15 more...)

arXiv.org Artificial Intelligence

2410.07239

Country:

Europe > Germany > Saxony > Leipzig (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Advancing the Arabic WordNet: Elevating Content Quality

Freihat, Abed Alhakim, Khalilia, Hadi, Bella, Gábor, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceMar-29-2024

High-quality WordNets are crucial for achieving high-quality results in NLP applications that rely on such resources. However, the wordnets of most languages suffer from serious issues of correctness and completeness with respect to the words and word meanings they define, such as incorrect lemmas, missing glosses and example sentences, or an inadequate, Western-centric representation of the morphology and the semantics of the language. Previous efforts have largely focused on increasing lexical coverage while ignoring other qualitative aspects. In this paper, we focus on the Arabic language and introduce a major revision of the Arabic WordNet that addresses multiple dimensions of lexico-semantic resource quality. As a result, we updated more than 58% of the synsets of the existing Arabic WordNet by adding missing information and correcting errors. In order to address issues of language diversity and untranslatability, we also extended the wordnet structure by new elements: phrasets and lexical gaps.

arabic, lemma, synset, (15 more...)

arXiv.org Artificial Intelligence

2403.20215

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Asia > Middle East > Palestine (0.04)
North America > United States > Pennsylvania (0.04)
(10 more...)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Lexical Diversity in Kinship Across Languages and Dialects

Khalilia, Hadi, Bella, Gábor, Freihat, Abed Alhakim, Darma, Shandy, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceOct-26-2023

Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities.

dialect, indonesian language, lexical diversity, (15 more...)

arXiv.org Artificial Intelligence

2308.13056

Country:

Europe > United Kingdom > UK North Sea (0.09)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.09)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
(16 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Representing Interlingual Meaning in Lexical Databases

Giunchiglia, Fausto, Bella, Gabor, Nair, Nandu Chandran, Chi, Yang, Xu, Hao

arXiv.org Artificial IntelligenceJan-22-2023

In today's multilingual lexical databases, the majority of the world's languages are under-represented. Beyond a mere issue of resource incompleteness, we show that existing lexical databases have structural limitations that result in a reduced expressivity on culturally-specific words and in mapping them across languages. In particular, the lexical meaning space of dominant languages, such as English, is represented more accurately while linguistically or culturally diverse languages are mapped in an approximate manner. Our paper assesses state-of-the-art multilingual lexical databases and evaluates their strengths and limitations with respect to their expressivity on lexical phenomena of linguistic diversity.

artificial intelligence, mapping, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.09169

Country:

Europe > United Kingdom > UK North Sea (0.07)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.07)
Asia > India (0.05)
(9 more...)

Genre:

Overview (0.68)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Word Embedding based Correlation Model for Question/Answer Matching

Shen, Yikang, Rong, Wenge, Jiang, Nan, Peng, Baolin, Tang, Jie, Xiong, Zhang

arXiv.org Artificial IntelligenceNov-25-2016

With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web. Question and answer matching has been attached much importance to for its ability to reuse knowledge stored in these systems: it can be useful in enhancing user experience with recurrent questions. In this paper, we try to improve the matching accuracy by overcoming the lexical gap between question and answer pairs. A Word Embedding based Correlation (WEC) model is proposed by integrating advantages of both the translation model and word embedding, given a random pair of words, WEC can score their co-occurrence probability in Q&A pairs and it can also leverage the continuity and smoothness of continuous space word representation to deal with new pairs of words that are rare in the training parallel text. An experimental study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new method's promising potential.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1511.04646

Country: Asia > China (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Convolutional Neural Tensor Network Architecture for Community-Based Question Answering

Qiu, Xipeng (Fudan University) | Huang, Xuanjing (Fudan University)

AAAI ConferencesJul-15-2015

Retrieving similar questions is very important in community-based question answering. A major challenge is the lexical gap in sentence matching. In this paper, we propose a convolutional neural tensor network architecture to encode the sentences in semantic space and model their interactions with a tensor layer. Our model integrates sentence modeling and semantic matching into a single model, which can not only capture the useful information with convolutional and pooling layers, but also learn the matching metrics between the question and its answer. Besides, our model is a general architecture, with no need for the other knowledge such as lexical or syntactic analysis. The experimental results shows that our method outperforms the other methods on two matching tasks.

interaction, question-answer pair, representation, (13 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback