Goto

Collaborating Authors

 lexicographer


Is the Dictionary Done For?

The New Yorker

Is the Dictionary Done For? The print edition of Merriam-Webster was once a touchstone of authority and stability. Then the internet brought about a revolution. Wars over words are inevitably culture wars, and debates over the dictionary have raged for as long as it has existed. Once, every middle-class home had a piano and a dictionary. The purpose of the piano was to be able to listen to music before phonographs were available and affordable. Later on, it was to torture young persons by insisting that they learn to do something few people do well. The purpose of the dictionary was to settle intra-family disputes over the spelling of words like "camaraderie" and "sesquipedalian," or over the correct pronunciation of "puttee." This was the state of the world not that long ago. In the late nineteen-eighties, Merriam-Webster's Collegiate Dictionary was on the best-seller list for a hundred and fifty-five consecutive weeks. Fifty-seven million copies were sold, a number believed to be second only, in this country, to sales of the Bible. There was good money in the word business.


Qabas: An Open-Source Arabic Lexicographic Database

Jarrar, Mustafa, Hammouda, Tymaa

arXiv.org Artificial Intelligence

We present Qabas, a novel open-source Arabic lexicon designed for NLP applications. The novelty of Qabas lies in its synthesis of 110 lexicons. Specifically, Qabas lexical entries (lemmas) are assembled by linking lemmas from 110 lexicons. Furthermore, Qabas lemmas are also linked to 12 morphologically annotated corpora (about 2M tokens), making it the first Arabic lexicon to be linked to lexicons and corpora. Qabas was developed semi-automatically, utilizing a mapping framework and a web-based tool. Compared with other lexicons, Qabas stands as the most extensive Arabic lexicon, encompassing about 58K lemmas (45K nominal lemmas, 12.5K verbal lemmas, and 473 functional-word lemmas). Qabas is open-source and accessible online at https://sina.birzeit.edu/qabas.


The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment

Emezue, Chris Chinenye, Okoh, Ifeoma, Mbonu, Chinedu, Chukwuneke, Chiamaka, Lal, Daisy, Ezeani, Ignatius, Rayson, Paul, Onwuzulike, Ijemma, Okeke, Chukwuma, Nweya, Gerald, Ogbonna, Bright, Oraegbunam, Chukwuebuka, Awo-Ndubuisi, Esther Chidinma, Osuagwu, Akudo Amarachukwu, Nmezi, Obioha

arXiv.org Artificial Intelligence

The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study. This highlights the need to develop language technologies for Igbo to foster communication, learning and preservation. To create robust, impactful, and widely adopted language technologies for Igbo, it is essential to incorporate the multi-dialectal nature of the language. The primary obstacle in achieving dialectal-aware language technologies is the lack of comprehensive dialectal datasets. In response, we present the IgboAPI dataset, a multi-dialectal Igbo-English dictionary dataset, developed with the aim of enhancing the representation of Igbo dialects. Furthermore, we illustrate the practicality of the IgboAPI dataset through two distinct studies: one focusing on Igbo semantic lexicon and the other on machine translation. In the semantic lexicon project, we successfully establish an initial Igbo semantic lexicon for the Igbo semantic tagger, while in the machine translation study, we demonstrate that by finetuning existing machine translation systems using the IgboAPI dataset, we significantly improve their ability to handle dialectal variations in sentences.


Why Scrabble's New Official Word List Is So Embarrassing

Slate

Since Scrabble adopted an official lexicon in 1978, one thing has been constant: People have never stopped arguing about what is or isn't a word. Players have defended the game by noting that its letter strings--from AA (a kind of Hawaiian lava) to ZZZ (an interjection for sleep)--could be found in a bunch of standard North American dictionaries, books that have been used through the years to compile and revise Scrabble's tournament word list. But after an update last month introduced dozens of suspect words, riling up the community of competitive players, that's becoming harder to do. The linguistic tumult began in September, when the organization that maintains the word list used in club and tournament Scrabble, NASPA Games, published a draft of its update. The NASPA list includes all of the words in the Official Scrabble Players Dictionary, the go-to source for living-room and app players in North America, plus a lot more.


'AI' named most notable word of 2023 by Collins dictionary

The Guardian

The technology that is set to dominate the future – for good or ill – is now the word of the year. "AI" has been named the most notable word of 2023 by the dictionary publisher Collins. Defined as "the modelling of human mental functions by computer programs", AI was chosen because it "has accelerated at such a fast pace and become the dominant conversation of 2023", the publisher said. The use of the word (strictly an initialism) has quadrupled over the past year. It was chosen from a list of new terms that the publisher said reflect "our ever-evolving language and the concerns of those who use it".


The Trouble With AI: Human Intelligence

#artificialintelligence

The trouble with AI is that no one knows what "AI" actually means. The trouble with AI is that it lacks a clear definition, that it suffers from the unique nature of its creators' intelligence and the fuzzy language they use. "Intelligent machines" will not have our imagination, our creativity, our shared experiences and ... [ ] traditions. A definition, according to leading lexicographers (an occupation Samuel Johnson defined as "a writer of dictionaries; a harmless drudge that busies himself in tracing the original, and detailing the signification of words"), tells us the meaning of a word, providing us "with precise statement of the essential nature of a thing." But the Oxford English Dictionary (OED) also notes an "obsolete and rare" meaning, namely "The setting of bounds or limits; limitation, restriction."


Compliance Dictionary aims for a simpler life

#artificialintelligence

Globalization, an ever-growing corpus of regulations and increasing business complexity all conspire to make it challenging to understand, implement and prove regulatory compliance. With the Compliance Dictionary, Unified Compliance Framework (UCF) is aiming to change that. For instance, 'Personally Identifiable Information' (PII) was defined legally in a 2007 memorandum from the Executive Office of the President, Office of Management and Budget (OMB) and later adopted in the National Institute of Standards and Technology (NIST) Guide to Protecting the Confidentiality of Personally Identifiable Information (SP 800-122). But other regulatory and standards bodies frequently refer to PII as'identifying information,' 'personal information' or'private information.' In the European Union, EU directive 95/46/EC refers to it as'personal data.'