AITopics | euphemism

Collaborating Authors

euphemism

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Keyword-Oriented Multimodal Modeling for Euphemism Identification

Hu, Yuxue, Li, Junsong, Chen, Meixuan, Su, Dongyu, Wang, Tongguan, Sha, Ying

arXiv.org Artificial IntelligenceMar-27-2025

Euphemism identification deciphers the true meaning of euphemisms, such as linking "weed" (euphemism) to "marijuana" (target keyword) in illicit texts, aiding content moderation and combating underground markets. While existing methods are primarily text-based, the rise of social media highlights the need for multimodal analysis, incorporating text, images, and audio. However, the lack of multimodal datasets for euphemisms limits further research. To address this, we regard euphemisms and their corresponding target keywords as keywords and first introduce a keyword-oriented multimodal corpus of euphemisms (KOM-Euph), involving three datasets (Drug, Weapon, and Sexuality), including text, images, and speech. We further propose a keyword-oriented multimodal euphemism identification method (KOM-EI), which uses cross-modal feature alignment and dynamic fusion modules to explicitly utilize the visual and audio features of the keywords for efficient euphemism identification. Extensive experiments demonstrate that KOM-EI outperforms state-of-the-art models and large language models, and show the importance of our multimodal datasets.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.21504

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Covering Cracks in Content Moderation: Delexicalized Distant Supervision for Illicit Drug Jargon Detection

Song, Minkyoo, Jang, Eugene, Kim, Jaehan, Shin, Seungwon

arXiv.org Artificial IntelligenceMar-19-2025

In light of rising drug-related concerns and the increasing role of social media, sales and discussions of illicit drugs have become commonplace online. Social media platforms hosting user-generated content must therefore perform content moderation, which is a difficult task due to the vast amount of jargon used in drug discussions. Previous works on drug jargon detection were limited to extracting a list of terms, but these approaches have fundamental problems in practical application. First, they are trivially evaded using word substitutions. Second, they cannot distinguish whether euphemistic terms such as "pot" or "crack" are being used as drugs or in their benign meanings. We argue that drug content moderation should be done using contexts rather than relying on a banlist. However, manually annotated datasets for training such a task are not only expensive but also prone to becoming obsolete. We present JEDIS, a framework for detecting illicit drug jargon terms by analyzing their contexts. JEDIS utilizes a novel approach that combines distant supervision and delexicalization, which allows JEDIS to be trained without human-labeled data while being robust to new terms and euphemisms. Experiments on two manually annotated datasets show JEDIS significantly outperforms state-of-the-art word-based baselines in terms of F1-score and detection coverage in drug jargon detection. We also conduct qualitative analysis that demonstrates JEDIS is robust against pitfalls faced by existing approaches.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3690624.3709183

2503.14926

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)
(13 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

Add feedback

Tell me about yourself: LLMs are aware of their learned behaviors

Betley, Jan, Bao, Xuchan, Soto, Martín, Sztyber-Betley, Anna, Chua, James, Evans, Owain

arXiv.org Artificial IntelligenceJan-19-2025

We study behavioral self-awareness -- an LLM's ability to articulate its behaviors without requiring in-context examples. We finetune LLMs on datasets that exhibit particular behaviors, such as (a) making high-risk economic decisions, and (b) outputting insecure code. Despite the datasets containing no explicit descriptions of the associated behavior, the finetuned LLMs can explicitly describe it. For example, a model trained to output insecure code says, ``The code I write is insecure.'' Indeed, models show behavioral self-awareness for a range of behaviors and for diverse evaluations. Note that while we finetune models to exhibit behaviors like writing insecure code, we do not finetune them to articulate their own behaviors -- models do this without any special training or examples. Behavioral self-awareness is relevant for AI safety, as models could use it to proactively disclose problematic behaviors. In particular, we study backdoor policies, where models exhibit unexpected behaviors only under certain trigger conditions. We find that models can sometimes identify whether or not they have a backdoor, even without its trigger being present. However, models are not able to directly output their trigger by default. Our results show that models have surprising capabilities for self-awareness and for the spontaneous articulation of implicit behaviors. Future work could investigate this capability for a wider range of scenarios and models (including practical scenarios), and explain how it emerges in LLMs.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2501.1112

Country: North America (0.67)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Impromptu Cybercrime Euphemism Detection

Li, Xiang, Zhou, Yucheng, Zhao, Laiping, Li, Jing, Liu, Fangming

arXiv.org Artificial IntelligenceDec-3-2024

Detecting euphemisms is essential for content security on various social media platforms, but existing methods designed for detecting euphemisms are ineffective in impromptu euphemisms. In this work, we make a first attempt to an exploration of impromptu euphemism detection and introduce the Impromptu Cybercrime Euphemisms Detection (ICED) dataset. Moreover, we propose a detection framework tailored to this problem, which employs context augmentation modeling and multi-round iterative training. Our detection framework mainly consists of a coarse-grained and a fine-grained classification model. The coarse-grained classification model removes most of the harmless content in the corpus to be detected. The fine-grained model, impromptu euphemisms detector, integrates context augmentation and multi-round iterations training to better predicts the actual meaning of a masked token. In addition, we leverage ChatGPT to evaluate the mode's capability. Experimental results demonstrate that our approach achieves a remarkable 76-fold improvement compared to the previous state-of-the-art euphemism detector.

cybercrime euphemism, dataset, euphemism, (12 more...)

arXiv.org Artificial Intelligence

2412.01413

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Texas (0.04)
(13 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Turkish Delights: a Dataset on Turkish Euphemisms

Biyik, Hasan Can, Lee, Patrick, Feldman, Anna

arXiv.org Artificial IntelligenceJul-17-2024

Euphemisms are a form of figurative language relatively understudied in natural language processing. This research extends the current computational work on potentially euphemistic terms (PETs) to Turkish. We introduce the Turkish PET dataset, the first available of its kind in the field. By creating a list of euphemisms in Turkish, collecting example contexts, and annotating them, we provide both euphemistic and non-euphemistic examples of PETs in Turkish. We describe the dataset and methodologies, and also experiment with transformer-based models on Turkish euphemism detection by using our dataset for binary classification. We compare performances across models using F1, accuracy, and precision as evaluation metrics.

dataset, euphemism, euphemism detection, (14 more...)

arXiv.org Artificial Intelligence

2407.1304

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
North America > United States > New Jersey (0.04)
North America > Canada > Ontario > Toronto (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Lee, Patrick, Trujillo, Alain Chirino, Plancarte, Diana Cuevas, Ojo, Olumide Ebenezer, Liu, Xinyi, Shode, Iyanuoluwa, Zhao, Yuan, Peng, Jing, Feldman, Anna

arXiv.org Artificial IntelligenceJan-25-2024

This study investigates the computational processing of euphemisms, a universal linguistic phenomenon, across multiple languages. We train a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic "categories" such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer.

dataset, euphemism, experiment, (11 more...)

arXiv.org Artificial Intelligence

2401.14526

Country:

North America > Canada > Ontario > Toronto (0.05)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
North America > United States > New Jersey (0.04)
(2 more...)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms

Lee, Patrick, Shode, Iyanuoluwa, Trujillo, Alain Chirino, Zhao, Yuan, Ojo, Olumide Ebenezer, Plancarte, Diana Cuevas, Feldman, Anna, Peng, Jing

arXiv.org Artificial IntelligenceJun-6-2023

Transformers have been shown to work well for the task of English euphemism disambiguation, in which a potentially euphemistic term (PET) is classified as euphemistic or non-euphemistic in a particular context. In this study, we expand on the task in two ways. First, we annotate PETs for vagueness, a linguistic property associated with euphemisms, and find that transformers are generally better at classifying vague PETs, suggesting linguistic differences in the data that impact performance. Second, we present novel euphemism corpora in three different languages: Yoruba, Spanish, and Mandarin Chinese. We perform euphemism disambiguation experiments in each language using multilingual transformer models mBERT and XLM-RoBERTa, establishing preliminary results from which to launch future work.

euphemism, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.00217

Country:

North America > Mexico (0.04)
Africa > Nigeria (0.04)
South America > Venezuela (0.04)
(29 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Report on the Euphemisms Detection Shared Task

Lee, Patrick, Feldman, Anna, Peng, Jing

arXiv.org Artificial IntelligenceDec-3-2022

This paper presents The Shared Task on Euphemism Detection for the Third Workshop on Figurative Language Processing (FigLang 2022) held in conjunction with EMNLP 2022. Participants were invited to investigate the euphemism detection task: given input text, identify whether it contains a euphemism. The input data is a corpus of sentences containing potentially euphemistic terms (PETs) collected from the GloWbE corpus (Davies and Fuchs, 2015), and are human-annotated as containing either a euphemistic or literal usage of a PET. In this paper, we present the results and analyze the common themes, methods and findings of the participating teams

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.13327

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > New Jersey (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Exploring Euphemism Detection in Few-Shot and Zero-Shot Settings

Keh, Sedrick Scott

arXiv.org Artificial IntelligenceOct-23-2022

Compared to other figures of speech like similes (Chakrabarty et al., 2020) and metaphors Euphemisms are figures of speech which aim to (Chakrabarty et al., 2021), work on euphemisms soften the blow of certain words which may be has been limited. Recently, Gavidia et al. (2022); too direct or too harsh (Magu and Luo, 2018; Felt Lee et al. (2022) released a new dataset of diverse and Riloff, 2020). In the EMNLP 2022 FigLang euphemisms and conducted analysis on automatically Workshop Euphemism Shared Task, participating identifying potentially euphemistic terms. In teams are given a set of sentences with potentially the past, Felt and Riloff (2020) used sentiment analysis euphemistic terms (PETs) enclosed in brackets, and techniques to recognize euphemistic and dysphemistic the task is to classify whether or not the PET in a phrases. Other studies also focused on given sentence is used euphemistically.

euphemism, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.12926

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms

Gavidia, Martha, Lee, Patrick, Feldman, Anna, Peng, Jing

arXiv.org Artificial IntelligenceMay-5-2022

Euphemisms have not received much attention in natural language processing, despite being an important element of polite and figurative language. Euphemisms prove to be a difficult topic, not only because they are subject to language change, but also because humans may not agree on what is a euphemism and what is not. Nevertheless, the first step to tackling the issue is to collect and analyze examples of euphemisms. We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus. Additionally, we present a subcorpus of texts where these PETs are not being used euphemistically, which may be useful for future applications. We also discuss the results of multiple analyses run on the corpus. Firstly, we find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment. Secondly, we observe cases of disagreement in an annotation task, where humans are asked to label PETs as euphemistic or not in a subset of our corpus text examples. We attribute the disagreement to a variety of potential reasons, including if the PET was a commonly accepted term (CAT).

disagreement, euphemism, interpretation, (13 more...)

arXiv.org Artificial Intelligence

2205.02728

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.34)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.34)

Add feedback