AITopics

2505.05831

Country: North America > United States > Oregon (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.96)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Chen, Danqing, Satish, Adithi, Khanbayov, Rasul, Schuster, Carolin M., Groh, Georg

Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements

arXiv.org Artificial IntelligenceSep-24-2024

This paper uses topic modeling and bias measurement techniques to analyze and determine gender bias in English song lyrics. We utilize BERTopic to cluster 537,553 English songs into distinct topics and chart their development over time. Our analysis shows the thematic shift in song lyrics over the years, from themes of romance to the increasing sexualization of women in songs. We observe large amounts of profanity and misogynistic lyrics on various topics, especially in the overall biggest cluster. Furthermore, to analyze gender bias across topics and genres, we employ the Single Category Word Embedding Association Test (SC-WEAT) to compute bias scores for the word embeddings trained on the most popular topics as well as for each genre. We find that words related to intelligence and strength tend to show a male bias across genres, as opposed to appearance and weakness words, which are more female-biased; however, a closer look also reveals differences in biases across topics.

gender bia, lyric, sc-weat score, (13 more...)

2409.15949

Country:

South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > Michigan (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.84)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Bendali, Omar Manil, Ferroum, Samir, Kozachenko, Ekaterina, Parviz, Youssef, Shcharbakova, Hanna, Tokareva, Anna, Williams, Shemair

Raply: A profanity-mitigated rap generator

arXiv.org Artificial IntelligenceJul-9-2024

The task of writing rap is challenging and involves producing complex rhyming schemes, yet meaningful lyrics. In this work, we propose Raply, a fine-tuned GPT-2 model capable of producing meaningful rhyming text in the style of rap. In addition to its rhyming capabilities, the model is able to generate less offensive content. It was achieved through the fine-tuning the model on a new dataset Mitislurs, a profanity-mitigated corpus. We evaluate the output of the model on two criteria: 1) rhyming based on the rhyme density metric; 2) profanity content, using the list of profanities for the English language. To our knowledge, this is the first attempt at profanity mitigation for rap lyrics generation.

computational linguistic, profanity, rap lyric generation, (12 more...)

2407.06941

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(5 more...)

Genre: Research Report (0.40)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

arXiv.org Artificial IntelligenceNov-12-2023

KoMultiText: Large-Scale Korean Text Dataset for Classifying Biased Speech in Real-World Online Services

Choi, Dasol, Song, Jooyoung, Lee, Eunsun, Seo, Jinwoo, Park, Heejune, Na, Dongbin

With the growth of online services, the need for advanced text classification algorithms, such as sentiment analysis and biased text detection, has become increasingly evident. The anonymous nature of online services often leads to the presence of biased and harmful language, posing challenges to maintaining the health of online communities. This phenomenon is especially relevant in South Korea, where large-scale hate speech detection algorithms have not yet been broadly explored. In this paper, we introduce "KoMultiText", a new comprehensive, large-scale dataset collected from a well-known South Korean SNS platform. Our proposed dataset provides annotations including (1) Preferences, (2) Profanities, and (3) Nine types of Bias for the text samples, enabling multi-task learning for simultaneous classification of user-generated texts. Leveraging state-of-the-art BERT-based language models, our approach surpasses human-level accuracy across diverse classification tasks, as measured by various metrics. Beyond academic contributions, our work can provide practical solutions for real-world hate speech and bias mitigation, contributing directly to the improvement of online community health. Our work provides a robust foundation for future research aiming to improve the quality of online discourse and foster societal well-being.

arxiv preprint arxiv, classification, dataset, (15 more...)

2310.04313

Country:

Asia > South Korea (0.55)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.66)
Media (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Okulska, Inez, Głąbińska, Kinga, Kołos, Anna, Karlińska, Agnieszka, Wiśnios, Emilia, Nowakowski, Adam, Ellerik, Paweł, Prałat, Andrzej

BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

arXiv.org Artificial IntelligenceAug-23-2023

Advances in automated detection of offensive language online, including hate speech and cyberbullying, require improved access to publicly available datasets comprising social media content. In this paper, we introduce BAN-PL, the first open dataset in the Polish language that encompasses texts flagged as harmful and subsequently removed by professional moderators. The dataset encompasses a total of 691,662 pieces of content from a popular social networking service, Wykop, often referred to as the "Polish Reddit", including both posts and comments, and is evenly distributed into two distinct classes: "harmful" and "neutral". We provide a comprehensive description of the data collection and preprocessing procedures, as well as highlight the linguistic specificity of the data. The BAN-PL dataset, along with advanced preprocessing scripts for, i.a., unmasking profanities, will be publicly available.

artificial intelligence, machine learning, social media, (17 more...)

2308.10592

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Media > News (0.88)
Information Technology > Security & Privacy (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

#artificialintelligenceDec-15-2022, 10:50:13 GMT

An "Unbiased" Guide to Bias in AI

Whenever there is any mention of ethics in the context of AI, the topic of bias & fairness often follows. Similarly, whenever there is any mention of training and testing machine learning models, the trade-off between bias & variance features heavily. But do these two mentions of bias refer to the same thing? In order for machines to learn these patterns, especially in "supervised learning", they go through a training process whereby an algorithm extracts patterns from a training dataset, typically in an iterative manner. It then tests its predictions on an unseen (out-of-sample) test dataset to validate if the patterns it had learnt from the training dataset are valid. Bias: The action of supporting or opposing a particular person or thing in an unfair way, because of allowing personal opinions to influence your judgment.

artificial intelligence, ethical bias, machine learning, (19 more...)

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Law (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Nozza, Debora, Hovy, Dirk

The State of Profanity Obfuscation in Natural Language Processing

arXiv.org Artificial IntelligenceOct-14-2022

Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable. This raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the unwarranted spread of hate speech is harmful to readers, and increases its internet frequency. While maintaining publications' professional appearance, obfuscating profanities makes it challenging to evaluate the content, especially for non-native speakers. Surveying 150 ACL papers, we discovered that obfuscation is usually employed for English but not other languages, and even so quite uneven. We discuss the problems with obfuscation and suggest a multilingual community resource called PrOf that has a Python module to standardize profanity obfuscation processes. We believe PrOf can help scientific publication policies to make hate speech work accessible and comparable, irrespective of language.

artificial intelligence, computational linguistic, natural language, (15 more...)

2210.07595

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.05)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Law (0.68)
Information Technology > Security & Privacy (0.47)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

#artificialintelligenceFeb-11-2022, 18:55:37 GMT

Apply profanity masking in Amazon Translate

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. This post shows how you can mask profane words and phrases with a grawlix string ("?$#@$"). Amazon Translate typically chooses clean words for your translation output. But in some situations, you want to prevent words that are commonly considered as profane terms from appearing in the translated output. For example, when you're translating video captions or subtitle content, or enabling in-game chat, and you want the translated content to be age appropriate and clear of any profanity, Amazon Translate allows you to mask the profane words and phrases using the profanity masking setting.

amazon translate, profane word, profanity, (10 more...)

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceJan-2-2019, 06:54:26 GMT

Comparison of the Top Speech Processing APIs – Data Science Central

Speech processing is a very popular area of machine learning. There is a significant demand in transforming human speech into text and text into speech. It is especially important regarding the development of self-services in different places: shops, transport, hotels, etc. Machines replace more and more human labor force, and these machines should be able to communicate with us using our language. That's why speech recognition is a perspective and significant area of artificial intelligence and machine learning. Today, many large companies provide APIs for performing different machine learning tasks. Speech recognition is not an exception.

api, machine learning, natural language, (16 more...)

Industry: Information Technology > Services (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.76)

#artificialintelligenceJan-1-2019, 02:04:07 GMT

Leonid Bershidsky - AI competition is the new space race

Kelsey Dallas: OMG: Is profanity losing its punch? Kathryn Moody: Investors, Are You Ready for the Next Global Crisis? Janet Bond Brill, Ph.D., R.D.N., F.A.N.D: How to prevent a second (and first) heart attack thru diet Katie Nielsen: As a mother, I'm all I need to be Nellie S. Huang: The Best Health Mutual Funds to Buy Now Brierly Wright, M.S., R.D.: Try these'secret-weapon' foods to boost your changes of losing weight The Kosher Gourmet by Jessica Yadegaran: Take some relish in pickled goodies (5 recipes!) Mark A. Kellner: OMG: Is profanity losing its punch? James K. Glassman: Investors, Are You Ready for the Next Global Crisis? Marsha McCulloch, M.S., R.D.: Think twice before giving up grains It's been another year of relentless artificial-intelligence hype and incremental AI achievement.

artificial intelligence, competition, natural language, (15 more...)

Country: Europe (0.74)

Industry:

Health & Medicine > Therapeutic Area (0.51)
Government > Space Agency (0.41)
Banking & Finance > Trading (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.30)