AITopics | headword

Collaborating Authors

headword

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Vision-Enabled LLMs in Historical Lexicography: Digitising and Enriching Estonian-German Dictionaries from the 17th and 18th Centuries

Jürviste, Madis, Jakobson, Joonatan

arXiv.org Artificial IntelligenceOct-10-2025

This article presents research conducted at the Institute of the Estonian Language between 2022 and 2025 on the application of large language models (LLMs) to the study of 17th and 18th century Estonian dictionaries. The authors address three main areas: enriching historical dictionaries with modern word forms and meanings; using vision-enabled LLMs to perform text recognition on sources printed in Gothic script (Fraktur); and preparing for the creation of a unified, cross-source dataset. Initial experiments with J. Gutslaff's 1648 dictionary indicate that LLMs have significant potential for semi-automatic enrichment of dictionary information. When provided with sufficient context, Claude 3.7 Sonnet accurately provided meanings and modern equivalents for 81% of headword entries. In a text recognition experiment with A. T. Helle's 1732 dictionary, a zero-shot method successfully identified and structured 41% of headword entries into error-free JSON-formatted output. For digitising the Estonian-German dictionary section of A. W. Hupel's 1780 grammar, overlapping tiling of scanned image files is employed, with one LLM being used for text recognition and a second for merging the structured output. These findings demonstrate that even for minor languages LLMs have a significant potential for saving time and financial resources.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.07931

Country: Europe > Estonia (0.52)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Investigating Transcription Normalization in the Faetar ASR Benchmark

Peckham, Leo, Ong, Michael, Nagy, Naomi, Dunbar, Ewan

arXiv.org Artificial IntelligenceAug-21-2025

We provide a small but important update on the Faetar Speech Recognition Benchmark [1]. The benchmark, initially released as a challenge task (with test data embargoed), is intended to teach us more about the domain of "dirty" low-resource ASR. We identified two major hurdles. First, due to an unfortunate error, one of the baselines for the constrained ASR task which interested most challenge participants had an incorrect phone error rate which was much lower than it should have been-the reported result in fact came from a different, unconstrained model. We felt the impact of this as potential participants hesitated to submit when they were unable to beat this incorrect number. This has since been corrected in the documentation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.11771

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.50)

Add feedback

Matching and Linking Entries in Historical Swedish Encyclopedias

Börjesson, Simon, Ersmark, Erik, Nugues, Pierre

arXiv.org Artificial IntelligenceJul-3-2025

The \textit{Nordisk familjebok} is a Swedish encyclopedia from the 19th and 20th centuries. It was written by a team of experts and aimed to be an intellectual reference, stressing precision and accuracy. This encyclopedia had four main editions remarkable by their size, ranging from 20 to 38 volumes. As a consequence, the \textit{Nordisk familjebok} had a considerable influence in universities, schools, the media, and society overall. As new editions were released, the selection of entries and their content evolved, reflecting intellectual changes in Sweden. In this paper, we used digitized versions from \textit{Project Runeberg}. We first resegmented the raw text into entries and matched pairs of entries between the first and second editions using semantic sentence embeddings. We then extracted the geographical entries from both editions using a transformer-based classifier and linked them to Wikidata. This enabled us to identify geographic trends and possible shifts between the first and second editions, written between 1876-1899 and 1904-1926, respectively. Interpreting the results, we observe a small but significant shift in geographic focus away from Europe and towards North America, Africa, Asia, Australia, and northern Scandinavia from the first to the second edition, confirming the influence of the First World War and the rise of new powers. The code and data are available on GitHub at https://github.com/sibbo/nordisk-familjebok.

large language model, machine learning, nordisk familjebok, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.latechclfl-1.1

2507.0117

Country:

Europe > Norway (0.34)
Oceania > Australia (0.25)
Africa (0.24)
(13 more...)

Genre: Research Report (0.64)

Industry: Government > Military (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD

Pražák, Ondřej, Konopík, Miloslav

arXiv.org Artificial IntelligenceAug-29-2024

Coreference resolution is the task of identifying language expressions that refer to the same real-world entity (antecedent) within a text. These coreferential expressions can sometimes appear within a single sentence, but often, they are spread across multiple sentences. In some challenging cases, it is necessary to consider the entire document to determine whether two expressions refer to the same entity. The task can be divided into two main subtasks: identifying entity mentions and grouping these mentions based on the real-world entities they refer to. Coreference resolution is closely related to anaphora resolution, as discussed in [2] Historically, coreference resolution was a standard preprocessing step in various natural language processing (NLP) tasks, such as machine translation, summarization, and information extraction. Although recent large language models have achieved state-of-the-art results in coreference resolution, they are expensive to train and deploy, and traditional (discriminative) approaches remain competitive. Expressing this task in natural language is challenging, and to the best of our knowledge, there have been no successful attempts to utilize large chatbots (like ChatGPT-4) to achieve superior results. Coreference resolution becomes particularly challenging in low-resource languages. One strategy to address this challenge is to train a multilingual model on datasets from multiple languages, thereby transferring knowledge from resource-rich languages to those with fewer resources.

coreference resolution, dataset, resolution, (14 more...)

arXiv.org Artificial Intelligence

2408.16893

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Maryland > Howard County > Columbia (0.04)
(10 more...)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)

Add feedback

Linking Named Entities in Diderot's \textit{Encyclop\'edie} to Wikidata

Nugues, Pierre

arXiv.org Artificial IntelligenceJun-5-2024

Diderot's \textit{Encyclop\'edie} is a reference work from XVIIIth century in Europe that aimed at collecting the knowledge of its era. \textit{Wikipedia} has the same ambition with a much greater scope. However, the lack of digital connection between the two encyclopedias may hinder their comparison and the study of how knowledge has evolved. A key element of \textit{Wikipedia} is Wikidata that backs the articles with a graph of structured data. In this paper, we describe the annotation of more than 10,300 of the \textit{Encyclop\'edie} entries with Wikidata identifiers enabling us to connect these entries to the graph. We considered geographic and human entities. The \textit{Encyclop\'edie} does not contain biographic entries as they mostly appear as subentries of locations. We extracted all the geographic entries and we completely annotated all the entries containing a description of human entities. This represents more than 2,600 links referring to locations or human entities. In addition, we annotated more than 9,500 entries having a geographic content only. We describe the annotation process as well as application examples. This resource is available at https://github.com/pnugues/encyclopedie_1751

encyclopedia, human entity, wikipedia, (17 more...)

arXiv.org Artificial Intelligence

2406.03221

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.07)
Europe > Italy (0.05)
Europe > Greece (0.05)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.83)

Add feedback

Low-Cost Generation and Evaluation of Dictionary Example Sentences

Cai, Bill, Ng, Clarence Boon Liang, Tan, Daniel, Hotama, Shelvia

arXiv.org Artificial IntelligenceApr-9-2024

Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundational models present the opportunity to create low-cost, zero-shot methods for the generation and evaluation of dictionary example sentences. We introduce a new automatic evaluation metric called OxfordEval that measures the win-rate of generated sentences against existing Oxford Dictionary sentences. OxfordEval shows high alignment with human judgments, enabling large-scale automated quality evaluation. We experiment with various LLMs and configurations to generate dictionary sentences across word classes. We complement this with a novel approach of using masked language models to identify and select sentences that best exemplify word meaning. The eventual model, FM-MLM, achieves over 85.1% win rate against Oxford baseline sentences according to OxfordEval, compared to 39.8% win rate for prior model-generated sentences.

dataset, example sentence, word sense, (14 more...)

arXiv.org Artificial Intelligence

2404.06224

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Singapore (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Detection of Non-recorded Word Senses in English and Swedish

Lautenschlager, Jonathan, Sköldberg, Emma, Hengchen, Simon, Schlechtweg, Dominik

arXiv.org Artificial IntelligenceMar-4-2024

This study addresses the task of Unknown Sense Detection in English and Swedish. The primary objective of this task is to determine whether the meaning of a particular word usage is documented in a dictionary or not. For this purpose, sense entries are compared with word usages from modern and historical corpora using a pre-trained Word-in-Context embedder that allows us to model this task in a few-shot scenario. Additionally, we use human annotations to adapt and evaluate our models. Compared to a random sample from a corpus, our model is able to considerably increase the detected number of word usages with non-recorded senses.

headword, usage, word usage, (15 more...)

arXiv.org Artificial Intelligence

2403.02285

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Saxony > Leipzig (0.05)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
(13 more...)

Genre: Research Report (0.40)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

"Definition Modeling: To model definitions." Generating Definitions With Little to No Semantics

Segonne, Vincent, Mickus, Timothee

arXiv.org Artificial IntelligenceJun-14-2023

Definition Modeling, the task of generating definitions, was first proposed as a means to evaluate the semantic quality of word embeddings-a coherent lexical semantic representations of a word in context should contain all the information necessary to generate its definition. The relative novelty of this task entails that we do not know which factors are actually relied upon by a Definition Modeling system. In this paper, we present evidence that the task may not involve as much semantics as one might expect: we show how an earlier model from the literature is both rather insensitive to semantic aspects such as explicit polysemy, as well as reliant on formal similarities between headwords and words occurring in its glosses, casting doubt on the validity of the task as a means to evaluate embeddings.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.08433

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
North America > United States > New York > New York County > New York City (0.04)
(23 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback

Evaluation of Automatically Constructed Word Meaning Explanations

Stará, Marie, Rychlý, Pavel, Horák, Aleš

arXiv.org Artificial IntelligenceFeb-27-2023

Preparing exact and comprehensive word meaning explanations is one of the key steps in the process of monolingual dictionary writing. In standard methodology, the explanations need an expert lexicographer who spends a substantial amount of time checking the consistency between the descriptive text and corpus evidence. In the following text, we present a new tool that derives explanations automatically based on collective information from very large corpora, particularly on word sketches. We also propose a quantitative evaluation of the constructed explanations, concentrating on explanations of nouns. The methodology is to a certain extent language independent; however, the presented verification is limited to Czech and English. We show that the presented approach allows to create explanations that contain data useful for understanding the word meaning in approximately 90% of cases. However, in many cases, the result requires post-editing to remove redundant information.

artificial intelligence, explanation, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.13625

Country:

Europe > Czechia > South Moravian Region > Brno (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
(4 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments

Zhang, Yu, Xia, Qingrong, Zhou, Shilin, Jiang, Yong, Fu, Guohong, Zhang, Min

arXiv.org Artificial IntelligenceSep-17-2022

Semantic role labeling (SRL) is a fundamental yet challenging task in the NLP community. Recent works of SRL mainly fall into two lines: 1) BIO-based; 2) span-based. Despite ubiquity, they share some intrinsic drawbacks of not considering internal argument structures, potentially hindering the model's expressiveness. The key challenge is arguments are flat structures, and there are no determined subtree realizations for words inside arguments. To remedy this, in this paper, we propose to regard flat argument spans as latent subtrees, accordingly reducing SRL to a tree parsing task. In particular, we equip our formulation with a novel span-constrained TreeCRF to make tree structures span-aware and further extend it to the second-order case. We conduct extensive experiments on CoNLL05 and CoNLL12 benchmarks. Results reveal that our methods perform favorably better than all previous syntax-agnostic works, achieving new state-of-the-art under both end-to-end and w/ gold predicates settings.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2110.06865

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
(32 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback