AITopics | harmful language

Collaborating Authors

harmful language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can Tinder Fix The Dating Landscape It Helped Ruin?

WIREDMar-20-2026, 11:00:00 GMT

The app reads your email inbox and your meeting calendar, then gives you a short audio summary. It can help you spend less time scrolling, but of course, there are privacy drawbacks to consider.

artificial intelligence, natural language, social media, (12 more...)

WIRED

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.05)
Asia > Middle East > Iran (0.05)
North America > United States > New York (0.04)
(5 more...)

Industry: Information Technology > Services (0.99)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.32)

Add feedback

Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections

Mastromichalakis, Orfeas Menis, Liartis, Jason, Rose, Kristina, Isaac, Antoine, Stamou, Giorgos

arXiv.org Artificial IntelligenceJun-2-2025

Cultural Heritage (CH) data hold invaluable knowledge, reflecting the history, traditions, and identities of societies, and shaping our understanding of the past and present. However, many CH collections contain outdated or offensive descriptions that reflect historical biases. CH Institutions (CHIs) face significant challenges in curating these data due to the vast scale and complexity of the task. To address this, we develop an AI-powered tool that detects offensive terms in CH metadata and provides contextual insights into their historical background and contemporary perception. We leverage a multilingual vocabulary co-created with marginalized communities, researchers, and CH professionals, along with traditional NLP techniques and Large Language Models (LLMs). Available as a standalone web app and integrated with major CH platforms, the tool has processed over 7.9 million records, contextualizing the contentious terms detected in their metadata. Rather than erasing these terms, our approach seeks to inform, making biases visible and providing actionable insights for creating more inclusive and accessible CH collections.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.24538

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.50)

Industry: Information Technology (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A comprehensive cross-language framework for harmful content detection with the aid of sentiment analysis

Dehghani, Mohammad

arXiv.org Artificial IntelligenceMar-2-2024

In today's digital world, social media plays a significant role in facilitating communication and content sharing. However, the exponential rise in user-generated content has led to challenges in maintaining a respectful online environment. In some cases, users have taken advantage of anonymity in order to use harmful language, which can negatively affect the user experience and pose serious social problems. Recognizing the limitations of manual moderation, automatic detection systems have been developed to tackle this problem. Nevertheless, several obstacles persist, including the absence of a universal definition for harmful language, inadequate datasets across languages, the need for detailed annotation guideline, and most importantly, a comprehensive framework. This study aims to address these challenges by introducing, for the first time, a detailed framework adaptable to any language. This framework encompasses various aspects of harmful language detection. A key component of the framework is the development of a general and detailed annotation guideline. Additionally, the integration of sentiment analysis represents a novel approach to enhancing harmful language detection. Also, a definition of harmful language based on the review of different related concepts is presented. To demonstrate the effectiveness of the proposed framework, its implementation in a challenging low-resource language is conducted. We collected a Persian dataset and applied the annotation guideline for harmful detection and sentiment analysis. Next, we present baseline experiments utilizing machine and deep learning methods to set benchmarks. Results prove the framework's high performance, achieving an accuracy of 99.4% in offensive language detection and 66.2% in sentiment analysis.

detection, harmful language, language detection, (15 more...)

arXiv.org Artificial Intelligence

2403.0127

Country:

Europe > Greece (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(20 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.88)

Industry:

Media > News (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(4 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(6 more...)

Add feedback

Improving alignment of dialogue agents via targeted human judgements

Glaese, Amelia, McAleese, Nat, Trębacz, Maja, Aslanides, John, Firoiu, Vlad, Ewalds, Timo, Rauh, Maribeth, Weidinger, Laura, Chadwick, Martin, Thacker, Phoebe, Campbell-Gillingham, Lucy, Uesato, Jonathan, Huang, Po-Sen, Comanescu, Ramona, Yang, Fan, See, Abigail, Dathathri, Sumanth, Greig, Rory, Chen, Charlie, Fritz, Doug, Elias, Jaume Sanchez, Green, Richard, Mokrá, Soňa, Fernando, Nicholas, Wu, Boxi, Foley, Rachel, Young, Susannah, Gabriel, Iason, Isaac, William, Mellor, John, Hassabis, Demis, Kavukcuoglu, Koray, Hendricks, Lisa Anne, Irving, Geoffrey

arXiv.org Artificial IntelligenceSep-28-2022

We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behaviour and allows for more efficient rule-conditional reward models. Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78% of the time. Sparrow is preferred more often than baselines while being more resilient to adversarial probing by humans, violating our rules only 8% of the time when probed. Finally, we conduct extensive analyses showing that though our model learns to follow our rules it can exhibit distributional biases.

machine learning, natural language, reinforcement learning, (24 more...)

arXiv.org Artificial Intelligence

2209.14375

Genre:

Research Report > New Finding (1.00)
Personal (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback