AITopics | ancient language

Collaborating Authors

ancient language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A lost ancient language may be hiding in plain sight

Popular ScienceOct-7-2025, 18:40:06 GMT

Amazon Prime Day is live. See the best deals HERE. Clues are left behind in the ruins of the Mesoamerican megacity Teotihuacan. Breakthroughs, discoveries, and DIY tips sent every weekday. At the height of its power, the ancient Mesoamerican city of Teotihuacan near present-day Mexico City was home to over 125,000 inhabitants.

ancient language, andrew paul, teotihuacan, (15 more...)

Popular Science

Country:

North America > Mexico > Mexico City > Mexico City (0.26)
Europe > Denmark > Capital Region > Copenhagen (0.06)
Africa > Middle East > Egypt (0.05)

Genre: Research Report > New Finding (0.51)

Industry:

Retail > Online (0.35)
Transportation (0.31)

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback

ParsiPy: NLP Toolkit for Historical Persian Texts in Python

Farsi, Farhan, Fazel, Parnian, Haghighi, Sepand, Sabouri, Sadra, Goshtasb, Farzaneh, Hajipour, Nadia, Asgari, Ehsaneddin, Sameti, Hossein

arXiv.org Artificial IntelligenceMar-22-2025

The study of historical languages presents unique challenges due to their complex orthographic systems, fragmentary textual evidence, and the absence of standardized digital representations of text in those languages. Tackling these challenges needs special NLP digital tools to handle phonetic transcriptions and analyze ancient texts. This work introduces ParsiPy, an NLP toolkit designed to facilitate the analysis of historical Persian languages by offering modules for tokenization, lemmatization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embedding. We demonstrate the utility of our toolkit through the processing of Parsig (Middle Persian) texts, highlighting its potential for expanding computational methods in the study of historical languages. Through this work, we contribute to computational philology, offering tools that can be adapted for the broader study of ancient texts and their digital preservation.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.1781

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > California (0.14)
Europe > Bulgaria > Varna Province > Varna (0.05)
(8 more...)

Genre: Research Report (0.52)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.91)

Add feedback

HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja

Song, Seyoung, Yoo, Haneul, Jin, Jiho, Cho, Kyunghyun, Oh, Alice

arXiv.org Artificial IntelligenceJan-21-2025

While Korean historical documents are invaluable cultural heritage, understanding those documents requires in-depth Hanja expertise. Hanja is an ancient language used in Korea before the 20th century, whose characters were borrowed from old Chinese but had evolved in Korea for centuries. Modern Koreans and Chinese cannot understand Korean historical documents without substantial additional help, and while previous efforts have produced some Korean and English translations, this requires in-depth expertise, and so most of the documents are not translated into any modern language. To address this gap, we present HERITAGE, the first open-source Hanja NLP toolkit to assist in understanding and translating the unexplored Korean historical documents written in Hanja. HERITAGE is a web-based platform providing model predictions of three critical tasks in historical document understanding via Hanja language models: punctuation restoration, named entity recognition, and machine translation (MT). HERITAGE also provides an interactive glossary, which provides the character-level reading of the Hanja characters in modern Korean, as well as character-level English definition. HERITAGE serves two purposes. First, anyone interested in these documents can get a general understanding from the model predictions and the interactive glossary, especially MT outputs in Korean and English. Second, since the model outputs are not perfect, Hanja experts can revise them to produce better annotations and translations. This would boost the translation efficiency and potentially lead to most of the historical documents being translated into modern languages, lowering the barrier on unexplored Korean historical documents.

artificial intelligence, computational linguistic, natural language, (12 more...)

arXiv.org Artificial Intelligence

2501.11951

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP

Chen, Danlu, Shi, Freda, Agarwal, Aditi, Myerston, Jacobo, Berg-Kirkpatrick, Taylor

arXiv.org Artificial IntelligenceAug-8-2024

Standard natural language processing (NLP) pipelines operate on symbolic representations of language, which typically consist of sequences of discrete tokens. However, creating an analogous representation for ancient logographic writing systems is an extremely labor intensive process that requires expert knowledge. At present, a large portion of logographic data persists in a purely visual form due to the absence of transcription -- this issue poses a bottleneck for researchers seeking to apply NLP toolkits to study ancient logographic languages: most of the relevant data are images of writing. This paper investigates whether direct processing of visual representations of language offers a potential solution. We introduce LogogramNLP, the first benchmark enabling NLP analysis of ancient logographic languages, featuring both transcribed and visual datasets for four writing systems along with annotations for tasks like classification, translation, and parsing. Our experiments compare systems that employ recent visual and text encoding strategies as backbones. The results demonstrate that visual representations outperform textual representations for some investigated tasks, suggesting that visual processing pipelines may unlock a large amount of cultural heritage data of logographic languages for NLP-based analyses.

len, representation, translation, (15 more...)

arXiv.org Artificial Intelligence

2408.04628

Country:

Africa > Middle East > Egypt (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(8 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey

Nowak, Krzysztof, Ziębura, Jędrzej, Wróbel, Krzysztof, Smywiński-Pohl, Aleksander

arXiv.org Artificial IntelligenceJun-29-2024

This study introduces the eFontes models for automatic linguistic annotation of Medieval Latin texts, focusing on lemmatization, part-of-speech tagging, and morphological feature determination. Using the Transformers library, these models were trained on Universal Dependencies (UD) corpora and the newly developed eFontes corpus of Polish Medieval Latin. The research evaluates the models' performance, addressing challenges such as orthographic variations and the integration of Latinized vernacular terms. The models achieved high accuracy rates: lemmatization at 92.60%, part-of-speech tagging at 83.29%, and morphological feature determination at 88.57%. The findings underscore the importance of high-quality annotated corpora and propose future enhancements, including extending the models to Named Entity Recognition.

corpus, proceedings, scenario, (14 more...)

arXiv.org Artificial Intelligence

2407.00418

Country:

Europe > Poland > Lesser Poland Province > Kraków (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > Western Europe (0.04)
(7 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.81)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.54)

Add feedback

Deciphering Oracle Bone Language with Diffusion Models

Guan, Haisu, Yang, Huanxin, Wang, Xinyu, Han, Shengwei, Liu, Yongge, Jin, Lianwen, Bai, Xiang, Liu, Yuliang

arXiv.org Artificial IntelligenceJun-2-2024

Originating from China's Shang Dynasty approximately 3,000 years ago, the Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technologies presents a novel frontier for OBS decipherment, challenging traditional NLP methods that rely heavily on large textual corpora, a luxury not afforded by historical languages. This paper introduces a novel approach by adopting image generation techniques, specifically through the development of Oracle Bone Script Decipher (OBSD). Utilizing a conditional diffusion-based strategy, OBSD generates vital clues for decipherment, charting a new course for AI-assisted analysis of ancient languages. To validate its efficacy, extensive experiments were conducted on an oracle bone script dataset, with quantitative results demonstrating the effectiveness of OBSD. Code and decipherment results will be made available at https://github.com/guanhaisu/OBSD.

chinese character, decipherment, diffusion model, (17 more...)

arXiv.org Artificial Intelligence

2406.00684

Country: Asia > China (0.25)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces

Ordway, Garett, Patrangenaru, Vic

arXiv.org Artificial IntelligenceMay-10-2024

Communication plays a vital role in human interaction. Studying language is a worthwhile task and more recently has become quantitative in nature with developments of fields like quantitative comparative linguistics and lexicostatistics. With respect to the authors own native languages, the ancestry of the English language and the Latin alphabet are of the primary interest. The Indo-European Tree traces many modern languages back to the Proto-Indo-European root. Swadesh's cognates played a large role in developing that historical perspective where some of the primary branches are Germanic, Celtic, Italic, and Balto-Slavic. This paper will use data analysis on open books where the simplest singular space is the 3-spider - a union T3 of three rays with their endpoints glued at a point 0 - which can represent these tree spaces for language clustering. These trees are built using a single linkage method for clustering based on distances between samples from languages which use the Latin Script. Taking three languages at a time, the barycenter is determined. Some initial results have found both non-sticky and sticky sample means. If the mean exhibits non-sticky properties, then one language may come from a different ancestor than the other two. If the mean is considered sticky, then the languages may share a common ancestor or all languages may have different ancestry.

open book, phylogenetic tree, swadesh, (15 more...)

arXiv.org Artificial Intelligence

2405.06549

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
Europe > Middle East (0.04)
Asia > Middle East (0.04)
(2 more...)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Therapeutic Area (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.68)

Add feedback

The hype around artificial intelligence

#artificialintelligenceJul-27-2022, 02:08:43 GMT

For example, computers obtained the ability to play and win games against humans, such as world champions in chess. Now, AI can be divided into two main categories: functionality-based and capability- based. The functionality-based AI ranges from reactive machines, with limited responsiveness to self-aware ones, where theoretically computers could understand human emotions. The capabilities-based AI ranges from artificial narrow intelligence, where narrowly defined performance tasks can be carried out, to artificial super intelligence, where computers can perform tasks better than humans. A key trait of AI is its ability to store and process large amounts of data.

ancient language, artificial intelligence, intelligence, (7 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.57)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.54)

Add feedback

Heaven's Vault: A Linguist's Buried Treasure

WIREDJan-27-2021, 13:00:00 GMT

I climb the stairs, my faithful robot Six warning me not to proceed. Do I heed their warning and take a step back? I can see a tall pillar-like statue up ahead, peering at me over a flight of stairs--the prospect of deciphering another fragment of glyphs is motivating me to proceed through the thinning air. As a linguist and writer, Heaven's Vault is the game that I've been waiting a very long time for. It brings together the craft of compelling narrative games and a BAFTA-nominated interactive story presented in a rich, visual novel interface, taking players on a journey of imagination and exploration within an entrancing game environment.

heaven, treasure, vault, (11 more...)

WIRED

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Industry: Leisure & Entertainment > Games > Computer Games (0.51)

Technology: Information Technology > Artificial Intelligence > Robots (0.37)

Add feedback

MIT's New Algorithm That Can Decipher Ancient Languages

#artificialintelligenceNov-23-2020, 04:15:19 GMT

In a few of my previous articles, I have discussed how technology has helped us to better understand the history behind many lost cultures and civilizations. However, today I am excited to present you with a new stage within this technological trend that will help us unveil so many mysteries from ancient history. The Massachusetts Institute of Technology (MIT) has just created an algorithm that can decipher ancient languages without the input of any sort of data. This sort of new technological trend is known as machine learning and it can be simply defined as an algorithm that can, in simple terms, teach itself. Enginers from MIT that have worked on this algorithm for some time state that they have perfected it in such a way that it can read ancient languages without any information about the culture of the language or other relations the language may have to other similar ancient languages.

algorithm, ancient language, civilization, (9 more...)

#artificialintelligence

Country:

North America > United States > Massachusetts (0.26)
Europe > Spain (0.06)

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback