perspective api
- Europe > Spain > Galicia > Madrid (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Oceania > Australia (0.04)
- (3 more...)
ExploringtheLimitsofDomain-AdaptiveTrainingfor DetoxifyingLarge-ScaleLanguageModels
Wethen comprehensively study detoxifying LMswithparameter sizesranging from126Mupto530B(3 largerthanGPT3), a scale that has never been studied before. We find thati) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to unlearn the toxic content seen at pretraining. Wealso explore parameter-efficient training methods fordetoxification.
- North America > United States > Illinois (0.04)
- North America > United States > California (0.04)
SafeSpace: An Integrated Web Application for Digital Safety and Emotional Well-being
Fatmi, Kayenat, Abbas, Mohammad
In the digital era, individuals are increasingly exposed to online harms such as toxicity, manipulation, and grooming, which often pose emotional and safety risks. Existing systems for detecting abusive content or issuing safety alerts operate in isolation and rarely combine digital safety with emotional well-being. In this paper, we present SafeSpace, a unified web application that integrates three modules: (1) toxicity detection in chats and screenshots using NLP models and Google's Perspective API, (2) a configurable safety ping system that issues emergency alerts with the user's live location (longitude and latitude) via SMTP-based emails when check-ins are missed or SOS alerts are manually triggered, and (3) a reflective questionnaire that evaluates relationship health and emotional resilience. The system employs Firebase for alert management and a modular architecture designed for usability, privacy, and scalability. The experimental evaluation shows 93% precision in toxicity detection, 100% reliability in safety alerts under emulator tests, and 92% alignment between automated and manual questionnaire scoring. SafeSpace, implemented as a web application, demonstrates the feasibility of integrating detection, protection, and reflection within a single platform, with future deployment envisioned as a mobile application for broader accessibility.
- Asia > India > NCT > New Delhi (0.05)
- Asia > India > NCT > Delhi (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.95)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.71)
- Information Technology (0.69)
- North America > United States (0.14)
- Europe > Middle East (0.04)
- Asia > Middle East (0.04)
- Africa > Middle East (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (11 more...)
- Research Report (0.46)
- Overview (0.46)
- Europe > Spain > Galicia > Madrid (0.04)
- Oceania > Australia (0.04)
- North America > United States > North Carolina (0.04)
- (7 more...)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
Mendu, Sai Krishna, Yenala, Harish, Gulati, Aditi, Kumar, Shanu, Agrawal, Parag
Large language models (LLMs) have become integral to various real-world applications, leveraging massive, web-sourced datasets like Common Crawl, C4, and FineWeb for pretraining. While these datasets provide linguistic data essential for high-quality natural language generation, they often contain harmful content, such as hate speech, misinformation, and biased narratives. Training LLMs on such unfiltered data risks perpetuating toxic behaviors, spreading misinformation, and amplifying societal biases which can undermine trust in LLM-driven applications and raise ethical concerns about their use. This paper presents a large-scale analysis of inappropriate content across these datasets, offering a comprehensive taxonomy that categorizes harmful webpages into Topical and Toxic based on their intent. We also introduce a prompt evaluation dataset, a high-accuracy Topical and Toxic Prompt (TTP), and a transformer-based model (HarmFormer) for harmful content filtering. Additionally, we create a new multi-harm open-ended toxicity benchmark (HA VOC) and provide crucial insights into how models respond to adversarial toxic inputs. Our work offers insights into ensuring safer LLM pretraining and serves as a resource for Responsible AI (RAI) compliance. Disclaimer: This paper includes potentially offensive content due to the nature of the research.
- North America > Mexico (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
- Media > News (0.87)
- Law (0.68)
- Information Technology > Security & Privacy (0.68)
Representative Ranking for Deliberation in the Public Sphere
Revel, Manon, Milli, Smitha, Lu, Tyler, Watson-Daniels, Jamelle, Nickel, Max
Online comment sections, such as those on news sites or social media, have the potential to foster informal public deliberation, However, this potential is often undermined by the frequency of toxic or low-quality exchanges that occur in these settings. To combat this, platforms increasingly leverage algorithmic ranking to facilitate higher-quality discussions, e.g., by using civility classifiers or forms of prosocial ranking. Yet, these interventions may also inadvertently reduce the visibility of legitimate viewpoints, undermining another key aspect of deliberation: representation of diverse views. We seek to remedy this problem by introducing guarantees of representation into these methods. In particular, we adopt the notion of justified representation (JR) from the social choice literature and incorporate a JR constraint into the comment ranking setting. We find that enforcing JR leads to greater inclusion of diverse viewpoints while still being compatible with optimizing for user engagement or other measures of conversational quality.
- Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.05)
- Asia > Middle East > Israel (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)