Goto

Collaborating Authors

 profanity


Oh F**k! How Do People Feel about Robots that Leverage Profanity?

Shippy, Madison R., Zhang, Brian J., Fitter, Naomi T.

arXiv.org Artificial Intelligence

Profanity is nearly as old as language itself, and cursing has become particularly ubiquitous within the last century. At the same time, robots in personal and service applications are often overly polite, even though past work demonstrates the potential benefits of robot norm-breaking. Thus, we became curious about robots using curse words in error scenarios as a means for improving social perceptions by human users. We investigated this idea using three phases of exploratory work: an online video-based study (N = 76) with a student pool, an online video-based study (N = 98) in the general U.S. population, and an in-person proof-of-concept deployment (N = 52) in a campus space, each of which included the following conditions: no-speech, non-expletive error response, and expletive error response. A surprising result in the outcomes for all three studies was that although verbal acknowledgment of an error was typically beneficial (as expected based on prior work), few significant differences appeared between the non-expletive and expletive error acknowledgment conditions (counter to our expectations). Within the cultural context of our work, the U.S., it seems that many users would likely not mind if robots curse, and may even find it relatable and humorous. This work signals a promising and mischievous design space that challenges typical robot character design.


Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements

Chen, Danqing, Satish, Adithi, Khanbayov, Rasul, Schuster, Carolin M., Groh, Georg

arXiv.org Artificial Intelligence

This paper uses topic modeling and bias measurement techniques to analyze and determine gender bias in English song lyrics. We utilize BERTopic to cluster 537,553 English songs into distinct topics and chart their development over time. Our analysis shows the thematic shift in song lyrics over the years, from themes of romance to the increasing sexualization of women in songs. We observe large amounts of profanity and misogynistic lyrics on various topics, especially in the overall biggest cluster. Furthermore, to analyze gender bias across topics and genres, we employ the Single Category Word Embedding Association Test (SC-WEAT) to compute bias scores for the word embeddings trained on the most popular topics as well as for each genre. We find that words related to intelligence and strength tend to show a male bias across genres, as opposed to appearance and weakness words, which are more female-biased; however, a closer look also reveals differences in biases across topics.


Raply: A profanity-mitigated rap generator

Bendali, Omar Manil, Ferroum, Samir, Kozachenko, Ekaterina, Parviz, Youssef, Shcharbakova, Hanna, Tokareva, Anna, Williams, Shemair

arXiv.org Artificial Intelligence

The task of writing rap is challenging and involves producing complex rhyming schemes, yet meaningful lyrics. In this work, we propose Raply, a fine-tuned GPT-2 model capable of producing meaningful rhyming text in the style of rap. In addition to its rhyming capabilities, the model is able to generate less offensive content. It was achieved through the fine-tuning the model on a new dataset Mitislurs, a profanity-mitigated corpus. We evaluate the output of the model on two criteria: 1) rhyming based on the rhyme density metric; 2) profanity content, using the list of profanities for the English language. To our knowledge, this is the first attempt at profanity mitigation for rap lyrics generation.


KoMultiText: Large-Scale Korean Text Dataset for Classifying Biased Speech in Real-World Online Services

Choi, Dasol, Song, Jooyoung, Lee, Eunsun, Seo, Jinwoo, Park, Heejune, Na, Dongbin

arXiv.org Artificial Intelligence

With the growth of online services, the need for advanced text classification algorithms, such as sentiment analysis and biased text detection, has become increasingly evident. The anonymous nature of online services often leads to the presence of biased and harmful language, posing challenges to maintaining the health of online communities. This phenomenon is especially relevant in South Korea, where large-scale hate speech detection algorithms have not yet been broadly explored. In this paper, we introduce "KoMultiText", a new comprehensive, large-scale dataset collected from a well-known South Korean SNS platform. Our proposed dataset provides annotations including (1) Preferences, (2) Profanities, and (3) Nine types of Bias for the text samples, enabling multi-task learning for simultaneous classification of user-generated texts. Leveraging state-of-the-art BERT-based language models, our approach surpasses human-level accuracy across diverse classification tasks, as measured by various metrics. Beyond academic contributions, our work can provide practical solutions for real-world hate speech and bias mitigation, contributing directly to the improvement of online community health. Our work provides a robust foundation for future research aiming to improve the quality of online discourse and foster societal well-being.


BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

Okulska, Inez, Głąbińska, Kinga, Kołos, Anna, Karlińska, Agnieszka, Wiśnios, Emilia, Nowakowski, Adam, Ellerik, Paweł, Prałat, Andrzej

arXiv.org Artificial Intelligence

Advances in automated detection of offensive language online, including hate speech and cyberbullying, require improved access to publicly available datasets comprising social media content. In this paper, we introduce BAN-PL, the first open dataset in the Polish language that encompasses texts flagged as harmful and subsequently removed by professional moderators. The dataset encompasses a total of 691,662 pieces of content from a popular social networking service, Wykop, often referred to as the "Polish Reddit", including both posts and comments, and is evenly distributed into two distinct classes: "harmful" and "neutral". We provide a comprehensive description of the data collection and preprocessing procedures, as well as highlight the linguistic specificity of the data. The BAN-PL dataset, along with advanced preprocessing scripts for, i.a., unmasking profanities, will be publicly available.


An "Unbiased" Guide to Bias in AI

#artificialintelligence

Whenever there is any mention of ethics in the context of AI, the topic of bias & fairness often follows. Similarly, whenever there is any mention of training and testing machine learning models, the trade-off between bias & variance features heavily. But do these two mentions of bias refer to the same thing? In order for machines to learn these patterns, especially in "supervised learning", they go through a training process whereby an algorithm extracts patterns from a training dataset, typically in an iterative manner. It then tests its predictions on an unseen (out-of-sample) test dataset to validate if the patterns it had learnt from the training dataset are valid. Bias: The action of supporting or opposing a particular person or thing in an unfair way, because of allowing personal opinions to influence your judgment.


The State of Profanity Obfuscation in Natural Language Processing

Nozza, Debora, Hovy, Dirk

arXiv.org Artificial Intelligence

Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable. This raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the unwarranted spread of hate speech is harmful to readers, and increases its internet frequency. While maintaining publications' professional appearance, obfuscating profanities makes it challenging to evaluate the content, especially for non-native speakers. Surveying 150 ACL papers, we discovered that obfuscation is usually employed for English but not other languages, and even so quite uneven. We discuss the problems with obfuscation and suggest a multilingual community resource called PrOf that has a Python module to standardize profanity obfuscation processes. We believe PrOf can help scientific publication policies to make hate speech work accessible and comparable, irrespective of language.


Apply profanity masking in Amazon Translate

#artificialintelligence

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. This post shows how you can mask profane words and phrases with a grawlix string ("?$#@$"). Amazon Translate typically chooses clean words for your translation output. But in some situations, you want to prevent words that are commonly considered as profane terms from appearing in the translated output. For example, when you're translating video captions or subtitle content, or enabling in-game chat, and you want the translated content to be age appropriate and clear of any profanity, Amazon Translate allows you to mask the profane words and phrases using the profanity masking setting.


Comparison of the Top Speech Processing APIs – Data Science Central

#artificialintelligence

Speech processing is a very popular area of machine learning. There is a significant demand in transforming human speech into text and text into speech. It is especially important regarding the development of self-services in different places: shops, transport, hotels, etc. Machines replace more and more human labor force, and these machines should be able to communicate with us using our language. That's why speech recognition is a perspective and significant area of artificial intelligence and machine learning. Today, many large companies provide APIs for performing different machine learning tasks. Speech recognition is not an exception.


Leonid Bershidsky - AI competition is the new space race

#artificialintelligence

Kelsey Dallas: OMG: Is profanity losing its punch? Kathryn Moody: Investors, Are You Ready for the Next Global Crisis? Janet Bond Brill, Ph.D., R.D.N., F.A.N.D: How to prevent a second (and first) heart attack thru diet Katie Nielsen: As a mother, I'm all I need to be Nellie S. Huang: The Best Health Mutual Funds to Buy Now Brierly Wright, M.S., R.D.: Try these'secret-weapon' foods to boost your changes of losing weight The Kosher Gourmet by Jessica Yadegaran: Take some relish in pickled goodies (5 recipes!) Mark A. Kellner: OMG: Is profanity losing its punch? James K. Glassman: Investors, Are You Ready for the Next Global Crisis? Marsha McCulloch, M.S., R.D.: Think twice before giving up grains It's been another year of relentless artificial-intelligence hype and incremental AI achievement.