AITopics

Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation

Kumar, Shanu, Kholkar, Gauri, Mendke, Saish, Sadana, Anubhav, Agrawal, Parag, Dandapat, Sandipan

With the growth of social media and large language models, content moderation has become crucial. Many existing datasets lack adequate representation of different groups, resulting in unreliable assessments. To tackle this, we propose a socio-culturally aware evaluation framework for LLM-driven content moderation and introduce a scalable method for creating diverse datasets using persona-based generation. Our analysis reveals that these datasets provide broader perspectives and pose greater challenges for LLMs than diversity-focused generation methods without personas. This challenge is especially pronounced in smaller LLMs, emphasizing the difficulties they encounter in moderating such diverse content.

large language model, machine learning, persona, (20 more...)

2412.13578

Country:

Asia > Middle East > Iraq (0.28)
Asia > Afghanistan (0.05)
Oceania > Australia (0.04)
(27 more...)

Genre: Research Report (0.63)

Industry:

Media > News (1.00)
Leisure & Entertainment (1.00)
Law > Civil Rights & Constitutional Law (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Trustworthy Transfer Learning: A Survey

Wu, Jun, He, Jingrui

Transfer learning aims to transfer knowledge or information from a source domain to a relevant target domain. In this paper, we understand transfer learning from the perspectives of knowledge transferability and trustworthiness. This involves two research questions: How is knowledge transferability quantitatively measured and enhanced across domains? Can we trust the transferred knowledge in the transfer learning process? To answer these questions, this paper provides a comprehensive review of trustworthy transfer learning from various aspects, including problem definitions, theoretical analysis, empirical algorithms, and real-world applications. Specifically, we summarize recent theories and algorithms for understanding knowledge transferability under (within-domain) IID and non-IID assumptions. In addition to knowledge transferability, we review the impact of trustworthiness on transfer learning, e.g., whether the transferred knowledge is adversarially robust or algorithmically fair, how to transfer the knowledge under privacy-preserving constraints, etc. Beyond discussing the current advancements, we highlight the open questions and future directions for understanding transfer learning in a reliable and trustworthy manner.

artificial intelligence, bayesian inference, machine learning, (14 more...)

2412.14116

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
(9 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.47)
Research Report > New Finding (0.34)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
(2 more...)

Clio: Privacy-Preserving Insights into Real-World AI Use

Tamkin, Alex, McCain, Miles, Handa, Kunal, Durmus, Esin, Lovitt, Liane, Rathi, Ankur, Huang, Saffron, Mountfield, Alfred, Hong, Jerry, Ritchie, Stuart, Stern, Michael, Clarke, Brian, Goldberg, Landon, Sumers, Theodore R., Mueller, Jared, McEachen, William, Mitchell, Wes, Carter, Shan, Clark, Jack, Kaplan, Jared, Ganguli, Deep

How are AI assistants being used in the real world? While model providers in theory have a window into this impact via their users' data, both privacy concerns and practical challenges have made analyzing this data difficult. To address these issues, we present Clio (Claude insights and observations), a privacy-preserving platform that uses AI assistants themselves to analyze and surface aggregated usage patterns across millions of conversations, without the need for human reviewers to read raw conversations. We validate this can be done with a high degree of accuracy and privacy by conducting extensive evaluations. We demonstrate Clio's usefulness in two broad ways. First, we share insights about how models are being used in the real world from one million Claude.ai Free and Pro conversations, ranging from providing advice on hairstyles to providing guidance on Git operations and concepts. We also identify the most common high-level use cases on Claude.ai (coding, writing, and research tasks) as well as patterns that differ across languages (e.g., conversations in Japanese discuss elder care and aging populations at higher-than-typical rates). Second, we use Clio to make our systems safer by identifying coordinated attempts to abuse our systems, monitoring for unknown unknowns during critical periods like launches of new capabilities or major world events, and improving our existing monitoring systems. We also discuss the limitations of our approach, as well as risks and ethical concerns. By enabling analysis of real-world AI usage, Clio provides a scalable platform for empirically grounded AI safety and governance.

data mining, large language model, machine learning, (23 more...)

2412.13678

Country:

North America > United States > New York > New York County > New York City (0.04)
Africa > Eswatini > Manzini > Manzini (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Law (1.00)
(4 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

A hybrid framework for effective and efficient machine unlearning

Li, Mingxin, Yu, Yizhen, Wang, Ning, Wang, Zhigang, Wang, Xiaodong, Qu, Haipeng, Xu, Jia, Su, Shen, Yin, Zhichao

Recently machine unlearning (MU) is proposed to remove the imprints of revoked samples from the already trained model parameters, to solve users' privacy concern. Different from the runtime expensive retraining from scratch, there exist two research lines, exact MU and approximate MU with different favorites in terms of accuracy and efficiency. In this paper, we present a novel hybrid strategy on top of them to achieve an overall success. It implements the unlearning operation with an acceptable computation cost, while simultaneously improving the accuracy as much as possible. Specifically, it runs reasonable unlearning techniques by estimating the retraining workloads caused by revocations. If the workload is lightweight, it performs retraining to derive the model parameters consistent with the accurate ones retrained from scratch. Otherwise, it outputs the unlearned model by directly modifying the current parameters, for better efficiency. In particular, to improve the accuracy in the latter case, we propose an optimized version to amend the output model with lightweight runtime penalty. We particularly study the boundary of two approaches in our frameworks to adaptively make the smart selection. Extensive experiments on real datasets validate that our proposals can improve the unlearning efficiency by 1.5$\times$ to 8$\times$ while achieving comparable accuracy.

accuracy, artificial intelligence, machine learning, (19 more...)

2412.14505

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
North America > United States > California > Orange County > Anaheim (0.04)
Asia > China > Shandong Province (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.94)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Saxena, Vageesh, Bashpole, Benjamin, Van Dijck, Gijs, Spanakis, Gerasimos

MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data

Human trafficking (HT) remains a critical issue, with traffickers increasingly leveraging online escort advertisements (ads) to advertise victims anonymously. Existing detection methods, including Authorship Attribution (AA), often center on text-based analyses and neglect the multimodal nature of online escort ads, which typically pair text with images. To address this gap, we introduce MATCHED, a multimodal dataset of 27,619 unique text descriptions and 55,115 unique images collected from the Backpage escort platform across seven U.S. cities in four geographical regions. Our study extensively benchmarks text-only, vision-only, and multimodal baselines for vendor identification and verification tasks, employing multitask (joint) training objectives that achieve superior classification and retrieval performance on in-distribution and out-of-distribution (OOD) datasets. Integrating multimodal features further enhances this performance, capturing complementary patterns across text and images. While text remains the dominant modality, visual data adds stylistic cues that enrich model performance. Moreover, text-image alignment strategies like CLIP and BLIP2 struggle due to low semantic overlap and vague connections between the modalities of escort ads, with end-to-end multimodal training proving more robust. Our findings emphasize the potential of multimodal AA (MAA) to combat HT, providing LEAs with robust tools to link ads and disrupt trafficking networks.

data mining, large language model, machine learning, (23 more...)

2412.13794

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Marketing (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Law > Civil Rights & Constitutional Law (0.61)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
(6 more...)

Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation

Zugecova, Aneta, Macko, Dominik, Srba, Ivan, Moro, Robert, Kopal, Jakub, Marcincinova, Katarina, Mesarcik, Matus

The capabilities of recent large language models (LLMs) to generate high-quality content indistinguishable by humans from human-written texts rises many concerns regarding their misuse. Previous research has shown that LLMs can be effectively misused for generating disinformation news articles following predefined narratives. Their capabilities to generate personalized (in various aspects) content have also been evaluated and mostly found usable. However, a combination of personalization and disinformation abilities of LLMs has not been comprehensively studied yet. Such a dangerous combination should trigger integrated safety filters of the LLMs, if there are some. This study fills this gap by evaluation of vulnerabilities of recent open and closed LLMs, and their willingness to generate personalized disinformation news articles in English. We further explore whether the LLMs can reliably meta-evaluate the personalization quality and whether the personalization affects the generated-texts detectability. Our results demonstrate the need for stronger safety-filters and disclaimers, as those are not properly functioning in most of the evaluated LLMs. Additionally, our study revealed that the personalization actually reduces the safety-filter activations; thus effectively functioning as a jailbreak. Such behavior must be urgently addressed by LLM developers and service providers.

large language model, machine learning, target group, (19 more...)

2412.13666

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > Ukraine (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > News (1.00)
Law > Statutes (0.92)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Cheng, Xiang, Mayya, Raveesh, Sedoc, João

From Human Annotation to LLMs: SILICON Annotation Workflow for Management Research

Unstructured text data annotation and analysis are fundamental to management research, often relying on human annotators through crowdsourcing platforms. While Large Language Models (LLMs) promise to provide a cost-effective and efficient alternative to human annotation, there lacks a systematic workflow that evaluate when LLMs are suitable or how to proceed with LLM-based text annotation in a reproducible manner. This paper addresses this methodological gap by introducing the ``SILICON" (\textbf{S}ystematic \textbf{I}nference with \textbf{L}LMs for \textbf{I}nformation \textbf{C}lassificati\textbf{o}n and \textbf{N}otation) workflow. The workflow integrates established principles of human annotation with systematic prompt optimization and model selection, addressing challenges such as developing robust annotation guidelines, establishing high-quality human baselines, optimizing prompts, and ensuring reproducibility across LLMs. We validate the SILICON workflow through seven case studies covering common management research tasks, including business proposal evaluation, dialog intent and breakdown analysis, review attribute detection. Our findings highlight the importance of validating annotation guideline agreement, the superiority of expert-developed human baselines over crowdsourced ones, the iterative nature of prompt optimization, and the necessity of testing multiple LLMs. Notably, we propose a regression-based methodology to empirically compare LLM outputs across prompts and models. Our workflow advances management research by establishing reproducible processes for LLM-based annotation that maintain scientific rigor. We provide practical guidance for researchers to effectively navigate the evolving landscape of generative AI tools effectively while maintaining transparency and reproducibility.

large language model, machine learning, natural language, (18 more...)

2412.14461

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)
North America > United States > Maryland (0.04)
(8 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Law (0.67)
Government > Regional Government > North America Government > United States Government (0.46)
Banking & Finance > Economy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Kühn, Ramona, Mitrović, Jelena, Granitzer, Michael

Enhancing Rhetorical Figure Annotation: An Ontology-Based Web Application with RAG Integration

Rhetorical figures play an important role in our communication. They are used to convey subtle, implicit meaning, or to emphasize statements. We notice them in hate speech, fake news, and propaganda. By improving the systems for computational detection of rhetorical figures, we can also improve tasks such as hate speech and fake news detection, sentiment analysis, opinion mining, or argument mining. Unfortunately, there is a lack of annotated data, as well as qualified annotators that would help us build large corpora to train machine learning models for the detection of rhetorical figures. The situation is particularly difficult in languages other than English, and for rhetorical figures other than metaphor, sarcasm, and irony. To overcome this issue, we develop a web application called "Find your Figure" that facilitates the identification and annotation of German rhetorical figures. The application is based on the German Rhetorical ontology GRhOOT which we have specially adapted for this purpose. In addition, we improve the user experience with Retrieval Augmented Generation (RAG). In this paper, we present the restructuring of the ontology, the development of the web application, and the built-in RAG pipeline. We also identify the optimal RAG settings for our application. Our approach is one of the first to practically use rhetorical ontologies in combination with RAG and shows promising results.

artificial intelligence, machine learning, natural language, (18 more...)

2412.13799

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > Puerto Rico > Peñuelas > Peñuelas (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry:

Law (0.95)
Government (0.66)
Media > News (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kawakami, Yuta, Tian, Jin

Mediation Analysis for Probabilities of Causation

Probabilities of causation (PoC) offer valuable insights for informed decision-making. This paper introduces novel variants of PoC-controlled direct, natural direct, and natural indirect probability of necessity and sufficiency (PNS). These metrics quantify the necessity and sufficiency of a treatment for producing an outcome, accounting for different causal pathways. We develop identification theorems for these new PoC measures, allowing for their estimation from observational data. We demonstrate the practical application of our results through an analysis of a real-world psychology dataset.

artificial intelligence, machine learning, ni-pns, (14 more...)

2412.14491

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Greenland (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.93)
Law > Alternative Dispute Resolution (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)