wordnet
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- (2 more...)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
SemanticGraph GuidedTopicDiscovery
Hierarchical topic models, such asgamma beliefnetwork, isstructured asatreewhere alltheleaf nodes of aparent node are on the same floor. Due tothebottom layer conceptisdefined astheword layer inthetopic model, all the children nodes of the bottom layer node concept are connected to its ancestor node. Extracting a four-layer subgraph from the WordNet that are rooted at these 736 terms, we obtain apriorWordNet with[736,314,152,52,11] nodes fromthebottom totoplayers. For Tab. 2, we choose a vocabulary of 20,000 terms, which has 4693 terms that overlap with the vocabulary ofthe WordNet.
OpenGloss: A Synthetic Encyclopedic Dictionary and Semantic Knowledge Graph
We present OpenGloss, a synthetic encyclopedic dictionary and semantic knowledge graph for English that integrates lexicographic definitions, encyclopedic context, etymological histories, and semantic relationships in a unified resource. OpenGloss contains 537K senses across 150K lexemes, on par with WordNet 3.1 and Open English WordNet, while providing more than four times as many sense definitions. These lexemes include 9.1M semantic edges, 1M usage examples, 3M collocations, and 60M words of encyclopedic content. Generated through a multi-agent procedural generation pipeline with schema-validated LLM outputs and automated quality assurance, the entire resource was produced in under one week for under $1,000. This demonstrates that structured generation can create comprehensive lexical resources at cost and time scales impractical for manual curation, enabling rapid iteration as foundation models improve. The resource addresses gaps in pedagogical applications by providing integrated content -- definitions, examples, collocations, encyclopedias, etymology -- that supports both vocabulary learning and natural language processing tasks. As a synthetically generated resource, OpenGloss reflects both the capabilities and limitations of current foundation models. The dataset is publicly available on Hugging Face under CC-BY 4.0, enabling researchers and educators to build upon and adapt this resource.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (15 more...)
- Law > Intellectual Property & Technology Law (0.68)
- Education > Educational Technology (0.46)
Adverbs Revisited: Enhancing WordNet Coverage of Adverbs with a Supersense Taxonomy
Lee, Jooyoung, de Sá, Jader Martins Camboim
Abstract--WordNet offers rich supersense hierarchies for nouns and verbs, yet adverbs remain underdeveloped, lacking a systematic semantic classification. We introduce a linguistically grounded supersense typology for adverbs, empirically validated through annotation, that captures major semantic domains including manner, temporal, frequency, degree, domain, speaker-oriented, and subject-oriented functions. Results from a pilot annotation study demonstrate that these categories provide broad coverage of adverbs in natural text and can be reliably assigned by human annotators. Incorporating this typology extends WordNet's coverage, aligns it more closely with linguistic theory, and facilitates downstream NLP applications such as word sense disambiguation, event extraction, sentiment analysis, and discourse modeling. We present the proposed supersense categories, annotation outcomes, and directions for future work. As a primary lexical class, adverbs perform a range of semantic functions, from answering fundamental questions about an event, such as how it was performed (manner), when it occurred (temporal), or to what extent a property holds (degree), to expressing speaker attitude, discourse stance, and logical relations between propositions. Despite this semantic richness, adverbs have long occupied an ambiguous and often marginalized position in linguistic classification, frequently described as a "residual" or "wastebasket" category [9, 20]. Words are often assigned to this category not because they share definable grammatical properties, but because they fail to conform to the morphological and syntactic criteria of nouns, verbs, adjectives, prepositions, or conjunctions.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Africa > South Africa (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- (9 more...)
CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning
Kikuchi, Masato, Ono, Masatsugu, Soga, Toshioki, Tanabe, Tetsu, Ozono, Tadachika
Although WordNet is a valuable resource owing to its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this, we developed a WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our method, we constructed a large-scale corpus containing both sense and CEFR-level information from our annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on our corpus perform comparably to those trained on gold-standard annotations. Furthermore, by combining our corpus with the gold-standard data, we developed a practical classifier that achieves a Macro-F1 score of 0.81, indicating the high accuracy of our annotations. Our annotated WordNet, corpus, and classifiers are publicly available to help bridge the gap between natural language processing and language education, thereby facilitating more effective and efficient language learning.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Hokkaidō (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)