conlang
IASC: Interactive Agentic System for ConLangs
Taguchi, Chihiro, Sproat, Richard
We present a system that uses LLMs as a tool in the development of Constructed Languages. The system is modular in that one first creates a target phonology for the language using an agentic approach that refines its output at each step with commentary feedback on its previous attempt. Next, a set of sentences is 'translated' from their English original into a morphosyntactic markup that reflects the word order and morphosyntactic feature specifications of the desired target language, with affixes represented as morphosyntactic feature bundles. From this translated corpus, a lexicon is constructed using the phonological model and the set of morphemes (stems and affixes) extracted from the 'translated' sentences. The system is then instructed to provide an orthography for the language, using an existing script such as Latin or Cyrillic. Finally, the system writes a brief grammatical handbook of the language. The system can also translate further sentences into the target language. Our goal is twofold. First, we hope that these tools will be fun to use for creating artificially constructed languages. Second, we are interested in exploring what LLMs 'know' about language-not what they know about any particular language or linguistic phenomenon, but how much they know about and understand language and linguistic concepts. As we shall see, there is a fairly wide gulf in capabilities both among different LLMs and among different linguistic specifications, with it being notably easier for systems to deal with more common patterns than rarer ones. An additional avenue that we explore is the application of our approach to translating from high-resource into low-resource languages. While the results so far are mostly negative, we provide some evidence that an improved version of the present system could afford some real gains in such tasks. https://github.com/SakanaAI/IASC
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France (0.04)
- (15 more...)
- Research Report > New Finding (1.00)
- Instructional Material (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
On Non-interactive Evaluation of Animal Communication Translators
Paradise, Orr, Gruber, David F., Kalai, Adam Tauman
If you had an AI Whale-to-English translator, how could you validate whether or not it is working? Does one need to interact with the animals or rely on grounded observations such as temperature? We provide theoretical and proof-of-concept experimental evidence suggesting that interaction and even observations may not be necessary for sufficiently complex languages. One may be able to evaluate translators solely by their English outputs, offering potential advantages in terms of safety, ethics, and cost. This is an instance of machine translation quality evaluation (MTQE) without any reference translations available. A key challenge is identifying ``hallucinations,'' false translations which may appear fluent and plausible. We propose using segment-by-segment translation together with the classic NLP shuffle test to evaluate translators. The idea is to translate animal communication, turn by turn, and evaluate how often the resulting translations make more sense in order than permuted. Proof-of-concept experiments on data-scarce human languages and constructed languages demonstrate the potential utility of this evaluation methodology. These human-language experiments serve solely to validate our reference-free metric under data scarcity. It is found to correlate highly with a standard evaluation based on reference translations, which are available in our experiments. We also perform a theoretical analysis suggesting that interaction may not be necessary nor efficient in the early stages of learning to translate.
- Europe > Austria > Vienna (0.14)
- Asia > India > Bihar > Patna (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (18 more...)
Explicit Learning and the LLM in Machine Translation
Marmonier, Malik, Bawden, Rachel, Sagot, Benoît
This study explores the capacity of large language models (LLMs) for explicit learning, a process involving the assimilation of metalinguistic explanations to carry out language tasks. Using constructed languages generated by cryptographic means as controlled test environments, we designed experiments to assess an LLM's ability to explicitly learn and apply grammar rules. Our results demonstrate that while LLMs possess a measurable capacity for explicit learning, this ability diminishes as the complexity of the linguistic phenomena at hand increases. Supervised fine-tuning on chains of thought significantly enhances LLM performance but struggles to generalize to typologically novel or more complex linguistic features. These findings point to the need for more diverse training sets and alternative fine-tuning strategies to further improve explicit learning by LLMs.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (12 more...)
A Network Analysis Approach to Conlang Research Literature
The field of conlang has evidenced an important growth in the last decades. This has been the product of a wide interest in the use and study of conlangs for artistic purposes. However, one important question is what it is happening with conlang in the academic world. This paper aims to have an overall understanding of the literature on conlang research. With this we aim to give a realistic picture of the field in present days. We have implemented a computational linguistic approach, combining bibliometrics and network analysis to examine all publications available in the Scopus database. Analysing over 2300 academic publications since 1927 until 2022, we have found that Esperanto is by far the most documented conlang. Three main authors have contributed to this: Garv\'ia R., Fiedler S., and Blanke D. The 1970s and 1980s have been the decades where the foundations of current research have been built. In terms of methodologies, language learning and experimental linguistics are the ones contributing to most to the preferred approaches of study in the field. We present the results and discuss our limitations and future work.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Oceania > Australia (0.04)
- (4 more...)
Hollywood's Love Affair With Fictional Languages
For big fans of James Cameron's Avatar, the 13-year wait between the original and this year's sequel probably felt near interminable. But die-hard fans might have counted with a bit more agony and say it's actually been vomrra zìsìt, or "15 years." Rather, the blue-skinned Na'vi people, who inhabit the planet Pandora in Cameron's universe, have four digits per hand. As a result, their language--painstakingly built from scratch for the movies--uses base-eight counting instead of the human base-10. Fifteen in Na'vi actually means eight plus five (as opposed to 10 plus five in English), making it the equivalent of our 13.
- Leisure & Entertainment (1.00)
- Media > Film (0.89)
18 Klingon Phrases That'll Save Your Life One Day
Long ago, as the crew of the Enterprise explored the final frontier, one man boldly did what few--if any--actors had ever done before: construct a language from scratch. But while James Doohan (Scotty) may have invented a form of Klingon on the set of Star Trek: The Motion Picture, the real credit for its enduring legacy goes to linguist Marc Okrand, who started developing Klingon for Trek films in 1984, bringing constructed languages ("conlangs") to generations of new enthusiasts, from Trekkers to Dune fans to Na'vi admirers. People constructed languages before Klingon: J.R.R. Tolkien created Quenya in 1915, later used in The Hobbit and Lord of the Rings; Edgar Rice Burroughs invented Barsoomian in 1912 for A Princess of Mars; St. Hildegard of Bingen fashioned the Lingua Ignota in 1200, crediting some angels for divine inspiration. But as part of a TV show beloved by millions of viewers, Okrand's Klingon brought conlangs to the popular lexicon. Much of Klingon's appeal comes from its lexical novelty.
- Leisure & Entertainment (0.89)
- Media > Film (0.50)