AITopics | synset

Collaborating Authors

synset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix: LanguageModelswithImageDescriptors areStrongFew-ShotVideo-LanguageLearners

Neural Information Processing SystemsFeb-8-2026, 08:27:28 GMT

For VaTeX captioning and retrieval, we use the latest v1.1 version3, which contains 25,991 videos for training and 6,000 videos for public testing. The statistics can be found in Table 1. Visual genome synsets are pairs, where the keys are noisy natural language phrases and the values are the mapped WordNet synsets [6]. Ifavisualtokenoccurs in multiple frames, we use the averaged frame indexas its temporal indicator. Specifically,for UniVL, we set the number of epoches to be50 and the linear warmup steps to be40.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
Asia > China (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

OpenGloss: A Synthetic Encyclopedic Dictionary and Semantic Knowledge Graph

Bommarito, Michael J. II

arXiv.org Artificial IntelligenceNov-25-2025

We present OpenGloss, a synthetic encyclopedic dictionary and semantic knowledge graph for English that integrates lexicographic definitions, encyclopedic context, etymological histories, and semantic relationships in a unified resource. OpenGloss contains 537K senses across 150K lexemes, on par with WordNet 3.1 and Open English WordNet, while providing more than four times as many sense definitions. These lexemes include 9.1M semantic edges, 1M usage examples, 3M collocations, and 60M words of encyclopedic content. Generated through a multi-agent procedural generation pipeline with schema-validated LLM outputs and automated quality assurance, the entire resource was produced in under one week for under $1,000. This demonstrates that structured generation can create comprehensive lexical resources at cost and time scales impractical for manual curation, enabling rapid iteration as foundation models improve. The resource addresses gaps in pedagogical applications by providing integrated content -- definitions, examples, collocations, encyclopedias, etymology -- that supports both vocabulary learning and natural language processing tasks. As a synthetically generated resource, OpenGloss reflects both the capabilities and limitations of current foundation models. The dataset is publicly available on Hugging Face under CC-BY 4.0, enabling researchers and educators to build upon and adapt this resource.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.18622

Country:

Europe (1.00)
North America > Canada (0.67)
Asia (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Industry:

Law > Intellectual Property & Technology Law (0.68)
Education > Educational Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adverbs Revisited: Enhancing WordNet Coverage of Adverbs with a Supersense Taxonomy

Lee, Jooyoung, de Sá, Jader Martins Camboim

arXiv.org Artificial IntelligenceNov-17-2025

Abstract--WordNet offers rich supersense hierarchies for nouns and verbs, yet adverbs remain underdeveloped, lacking a systematic semantic classification. We introduce a linguistically grounded supersense typology for adverbs, empirically validated through annotation, that captures major semantic domains including manner, temporal, frequency, degree, domain, speaker-oriented, and subject-oriented functions. Results from a pilot annotation study demonstrate that these categories provide broad coverage of adverbs in natural text and can be reliably assigned by human annotators. Incorporating this typology extends WordNet's coverage, aligns it more closely with linguistic theory, and facilitates downstream NLP applications such as word sense disambiguation, event extraction, sentiment analysis, and discourse modeling. We present the proposed supersense categories, annotation outcomes, and directions for future work. As a primary lexical class, adverbs perform a range of semantic functions, from answering fundamental questions about an event, such as how it was performed (manner), when it occurred (temporal), or to what extent a property holds (degree), to expressing speaker attitude, discourse stance, and logical relations between propositions. Despite this semantic richness, adverbs have long occupied an ambiguous and often marginalized position in linguistic classification, frequently described as a "residual" or "wastebasket" category [9, 20]. Words are often assigned to this category not because they share definable grammatical properties, but because they fail to conform to the morphological and syntactic criteria of nouns, verbs, adjectives, prepositions, or conjunctions.

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

2511.11214

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

The Cognate Data Bottleneck in Language Phylogenetics

Häuser, Luise, Stamatakis, Alexandros

arXiv.org Artificial IntelligenceJul-2-2025

To fully exploit the potential of computational phylogenetic methods for cognate data one needs to leverage specific (complex) models an machine learning-based techniques. However, both approaches require datasets that are substantially larger than the manually collected cognate data currently available. To the best of our knowledge, there exists no feasible approach to automatically generate larger cognate datasets. We substantiate this claim by automatically extracting datasets from BabelNet, a large multilingual encyclopedic dictionary. We demonstrate that phylogenetic inferences on the respective character matrices yield trees that are largely inconsistent with the established gold standard ground truth trees. We also discuss why we consider it as being unlikely to be able to extract more suitable character matrices from other multilingual resources. Phylogenetic data analysis approaches that require larger datasets can therefore not be applied to cognate data. Thus, it remains an open question how, and if these computational approaches can be applied in historical linguistics.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.00911

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > Middle East > Jordan (0.04)
(14 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Synonymous Variational Inference for Perceptual Image Compression

Liang, Zijian, Niu, Kai, Wang, Changshuo, Xu, Jin, Zhang, Ping

arXiv.org Artificial IntelligenceMay-29-2025

Recent contributions of semantic information theory reveal the set-element relationship between semantic and syntactic information, represented as synonymous relationships. In this paper, we propose a synonymous variational inference (SVI) method based on this synonymity viewpoint to re-analyze the perceptual image compression problem. It takes perceptual similarity as a typical synonymous criterion to build an ideal synonymous set (Synset), and approximate the posterior of its latent synonymous representation with a parametric density by minimizing a partial semantic KL divergence. This analysis theoretically proves that the optimization direction of perception image compression follows a triple tradeoff that can cover the existing rate-distortion-perception schemes. Additionally, we introduce synonymous image compression (SIC), a new image compression scheme that corresponds to the analytical process of SVI, and implement a progressive SIC codec to fully leverage the model's capabilities. Experimental results demonstrate comparable rate-distortion-perception performance using a single progressive SIC codec, thus verifying the effectiveness of our proposed analysis method.

artificial intelligence, image compression, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.22438

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge

Schreiter, Dimitri

arXiv.org Artificial IntelligenceMay-26-2025

Prompt engineering has emerged as a critical component in optimizing large language models (LLMs) for domain-specific tasks. However, the role of prompt specificity, especially in domains like STEM (physics, chemistry, biology, computer science and mathematics), medicine, and law, remains underexplored. This thesis addresses the problem of whether increasing the specificity of vocabulary in prompts improves LLM performance in domain-specific question-answering and reasoning tasks. We developed a synonymization framework to systematically substitute nouns, verbs, and adjectives with varying specificity levels, measuring the impact on four LLMs: Llama-3.1-70B-Instruct, Granite-13B-Instruct-V2, Flan-T5-XL, and Mistral-Large 2, across datasets in STEM, law, and medicine. Our results reveal that while generally increasing the specificity of prompts does not have a significant impact, there appears to be a specificity range, across all considered models, where the LLM performs the best. Identifying this optimal specificity range offers a key insight for prompt design, suggesting that manipulating prompts within this range could maximize LLM performance and lead to more efficient applications in specialized domains.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.17037

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.93)
Education > Educational Setting (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation

Valois, Pedro H. V., Souza, Lincon S., Shimomoto, Erica K., Fukui, Kazuhiro

arXiv.org Artificial IntelligenceDec-12-2024

Interpretability is a key challenge in fostering trust for Large Language Models (LLMs), which stems from the complexity of extracting reasoning from model's parameters. We present the Frame Representation Hypothesis, a theoretically robust framework grounded in the Linear Representation Hypothesis (LRH) to interpret and control LLMs by modeling multi-token words. Prior research explored LRH to connect LLM representations with linguistic concepts, but was limited to single token analysis. As most words are composed of several tokens, we extend LRH to multi-token words, thereby enabling usage on any textual data with thousands of concepts. To this end, we propose words can be interpreted as frames, ordered sequences of vectors that better capture token-word relationships. Then, concepts can be represented as the average of word frames sharing a common concept. We showcase these tools through Top-k Concept-Guided Decoding, which can intuitively steer text generation using concepts of choice. We verify said ideas on Llama 3.1, Gemma 2, and Phi 3 families, demonstrating gender and language biases, exposing harmful content, but also potential to remediate them, leading to safer and more transparent LLMs. Code is available at https://github.com/phvv-me/frame-representation-hypothesis.git

hypothesis, representation, vector, (15 more...)

arXiv.org Artificial Intelligence

2412.07334

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Nominal Class Assignment in Swahili: A Computational Account

Palmieri, Giada, Kogkalidis, Konstantinos

arXiv.org Artificial IntelligenceOct-16-2024

We discuss the open question of the relation between semantics and nominal class assignment in Swahili. We approach the problem from a computational perspective, aiming first to quantify the extent of this relation, and then to explicate its nature, taking extra care to suppress morphosyntactic confounds. Our results are the first of their kind, providing a quantitative evaluation of the semantic cohesion of each nominal class, as well as a nuanced taxonomic description of its semantic content.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.12406

Country:

North America > United States > Virginia (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Middle East (0.04)
(9 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cross-Lingual and Cross-Cultural Variation in Image Descriptions

Berger, Uri, Ponti, Edoardo M.

arXiv.org Artificial IntelligenceOct-12-2024

Do speakers of different languages talk differently about what they see? Behavioural and cognitive studies report cultural effects on perception; however, these are mostly limited in scope and hard to replicate. In this work, we conduct the first large-scale empirical study of cross-lingual variation in image descriptions. Using a multimodal dataset with 31 languages and images from diverse locations, we develop a method to accurately identify entities mentioned in captions and present in the images, then measure how they vary across languages. Our analysis reveals that pairs of languages that are geographically or genetically closer tend to mention the same entities more frequently. We also identify entity categories whose saliency is universally high (such as animate beings), low (clothing accessories) or displaying high variance across languages (landscape). In a case study, we measure the differences in a specific language pair (e.g., Japanese mentions clothing far more frequently than English). Furthermore, our method corroborates previous small-scale studies, including 1) Rosch et al. (1976)'s theory of basic-level categories, demonstrating a preference for entities that are neither too generic nor too specific, and 2) Miyamoto et al. (2006)'s hypothesis that environments afford patterns of perception, such as entity counts. Overall, our work reveals the presence of both universal and culture-specific patterns in entity mentions.

caption, computational linguistic, synset, (16 more...)

arXiv.org Artificial Intelligence

2409.16646

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan (0.04)
South America > Peru > Cusco Department > Cusco Province > Cusco (0.04)
(12 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Constructing Cloze Questions Generatively

Sun, Yicheng, Wang, Jie

arXiv.org Artificial IntelligenceOct-5-2024

We present a generative method called CQG for constructing cloze questions from a given article using neural networks and WordNet, with an emphasis on generating multigram distractors. Built on sense disambiguation, text-to-text transformation, WordNet's synset taxonomies and lexical labels, CQG selects an answer key for a given sentence, segments it into a sequence of instances, generates instance-level distractor candidates (IDCs) using a transformer and sibling synsets.It then removes inappropriate IDCs, ranks the remaining IDCs based on contextual embedding similarities, as well as synset and lexical relatedness, forms distractor candidates by combinatorially replacing instances with the corresponding top-ranked IDCs, and checks if they are legitimate phrases. Finally, it selects top-ranked distractor candidates based on contextual semantic similarities to the answer key. Experiments show that this method significantly outperforms SOTA results. Human judges also confirm the high qualities of the generated distractors.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IJCNN54540.2023.10191481

2410.04266

Country:

North America > United States > Massachusetts > Middlesex County > Lowell (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback