Goto

Collaborating Authors

 wordnet


OpenGloss: A Synthetic Encyclopedic Dictionary and Semantic Knowledge Graph

Bommarito, Michael J. II

arXiv.org Artificial Intelligence

We present OpenGloss, a synthetic encyclopedic dictionary and semantic knowledge graph for English that integrates lexicographic definitions, encyclopedic context, etymological histories, and semantic relationships in a unified resource. OpenGloss contains 537K senses across 150K lexemes, on par with WordNet 3.1 and Open English WordNet, while providing more than four times as many sense definitions. These lexemes include 9.1M semantic edges, 1M usage examples, 3M collocations, and 60M words of encyclopedic content. Generated through a multi-agent procedural generation pipeline with schema-validated LLM outputs and automated quality assurance, the entire resource was produced in under one week for under $1,000. This demonstrates that structured generation can create comprehensive lexical resources at cost and time scales impractical for manual curation, enabling rapid iteration as foundation models improve. The resource addresses gaps in pedagogical applications by providing integrated content -- definitions, examples, collocations, encyclopedias, etymology -- that supports both vocabulary learning and natural language processing tasks. As a synthetically generated resource, OpenGloss reflects both the capabilities and limitations of current foundation models. The dataset is publicly available on Hugging Face under CC-BY 4.0, enabling researchers and educators to build upon and adapt this resource.


Adverbs Revisited: Enhancing WordNet Coverage of Adverbs with a Supersense Taxonomy

Lee, Jooyoung, de Sá, Jader Martins Camboim

arXiv.org Artificial Intelligence

Abstract--WordNet offers rich supersense hierarchies for nouns and verbs, yet adverbs remain underdeveloped, lacking a systematic semantic classification. We introduce a linguistically grounded supersense typology for adverbs, empirically validated through annotation, that captures major semantic domains including manner, temporal, frequency, degree, domain, speaker-oriented, and subject-oriented functions. Results from a pilot annotation study demonstrate that these categories provide broad coverage of adverbs in natural text and can be reliably assigned by human annotators. Incorporating this typology extends WordNet's coverage, aligns it more closely with linguistic theory, and facilitates downstream NLP applications such as word sense disambiguation, event extraction, sentiment analysis, and discourse modeling. We present the proposed supersense categories, annotation outcomes, and directions for future work. As a primary lexical class, adverbs perform a range of semantic functions, from answering fundamental questions about an event, such as how it was performed (manner), when it occurred (temporal), or to what extent a property holds (degree), to expressing speaker attitude, discourse stance, and logical relations between propositions. Despite this semantic richness, adverbs have long occupied an ambiguous and often marginalized position in linguistic classification, frequently described as a "residual" or "wastebasket" category [9, 20]. Words are often assigned to this category not because they share definable grammatical properties, but because they fail to conform to the morphological and syntactic criteria of nouns, verbs, adjectives, prepositions, or conjunctions.




Appendix for TopicNet: Semantic Graph-Guided Topic Discovery A Detailed discussion of our work A.1 Limitations

Neural Information Processing Systems

We set the embedding size as 100 and the hidden size as 256. Nvidia GTX 8000 GPU and coded with PyTorch as attached in the supplement. The other settings are same with previous experiments. Due to the bottom layer concept is defined as the word layer in the topic model, all the children nodes of the bottom layer node concept are connected to its ancestor node. Then, the children node "dog" of node "mammal" is connected to the parent node The node "male" is the same setting.


CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning

Kikuchi, Masato, Ono, Masatsugu, Soga, Toshioki, Tanabe, Tetsu, Ozono, Tadachika

arXiv.org Artificial Intelligence

Although WordNet is a valuable resource owing to its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this, we developed a WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our method, we constructed a large-scale corpus containing both sense and CEFR-level information from our annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on our corpus perform comparably to those trained on gold-standard annotations. Furthermore, by combining our corpus with the gold-standard data, we developed a practical classifier that achieves a Macro-F1 score of 0.81, indicating the high accuracy of our annotations. Our annotated WordNet, corpus, and classifiers are publicly available to help bridge the gap between natural language processing and language education, thereby facilitating more effective and efficient language learning.





On the Distinctive Co-occurrence Characteristics of Antonymy

Cao, Zhihan, Yamada, Hiroaki, Tokunaga, Takenobu

arXiv.org Artificial Intelligence

Antonymy has long received particular attention in lexical semantics. Previous studies have shown that antonym pairs frequently co-occur in text, across genres and parts of speech, more often than would be expected by chance. However, whether this co-occurrence pattern is distinctive of antonymy remains unclear, due to a lack of comparison with other semantic relations. This work fills the gap by comparing antonymy with three other relations across parts of speech using robust co-occurrence metrics. We find that antonymy is distinctive in three respects: antonym pairs co-occur with high strength, in a preferred linear order, and within short spans. All results are available online.