Kwak, Haewoon
ToBlend: Token-Level Blending With an Ensemble of LLMs to Attack AI-Generated Text Detection
Huang, Fan, Kwak, Haewoon, An, Jisun
The robustness of AI-content detection models against sophisticated adversarial strategies, such as paraphrasing or word switching, is a rising concern in natural language generation (NLG) applications. This study proposes ToBlend, a novel token-level ensemble text generation method to challenge the robustness of current AI-content detection approaches by utilizing multiple sets of candidate generative large language models (LLMs). By randomly sampling token(s) from candidate LLMs sets, we find ToBlend significantly drops the performance of most mainstream AI-content detection methods. We evaluate the text quality produced under different ToBlend settings based on annotations from experienced human experts. We proposed a fine-tuned Llama3.1 model to distinguish the ToBlend generated text more accurately. Our findings underscore our proposed text generation approach's great potential in deceiving and improving detection models. Our datasets, codes, and annotations are open-sourced.
Neural embedding of beliefs reveals the role of relative dissonance in human decision-making
Lee, Byunghwee, Aiyappa, Rachith, Ahn, Yong-Yeol, Kwak, Haewoon, An, Jisun
Beliefs serve as the foundation for human cognition and decision-making. They guide individuals in deriving meaning from their lives, shaping their behaviors, and forming social connections. Therefore, a model that encapsulates beliefs and their interrelationships is crucial for quantitatively studying the influence of beliefs on our actions. Despite its importance, research on the interplay between human beliefs has often been limited to a small set of beliefs pertaining to specific issues, with a heavy reliance on surveys or experiments. Here, we propose a method for extracting nuanced relations between thousands of beliefs by leveraging large-scale user participation data from an online debate platform and mapping these beliefs to an embedding space using a fine-tuned large language model (LLM). This belief embedding space effectively encapsulates the interconnectedness of diverse beliefs as well as polarization across various social issues. We discover that the positions within this belief space predict new beliefs of individuals. Furthermore, we find that the relative distance between one's existing beliefs and new beliefs can serve as a quantitative estimate of cognitive dissonance, allowing us to predict new beliefs. Our study highlights how modern LLMs, when combined with collective online records of human beliefs, can offer insights into the fundamental principles that govern human belief formation and decision-making processes.
Rematch: Robust and Efficient Matching of Local Knowledge Graphs to Improve Structural and Semantic Similarity
Kachwala, Zoher, An, Jisun, Kwak, Haewoon, Menczer, Filippo
Knowledge graphs play a pivotal role in various applications, such as question-answering and fact-checking. Abstract Meaning Representation (AMR) represents text as knowledge graphs. Evaluating the quality of these graphs Figure 1: AMR for the sentence: "He did not cut the involves matching them structurally to each apple with a knife." Colors indicate AMR components: other and semantically to the source text. Existing instances (blue), relations (red), constants (teal), and attributes AMR metrics are inefficient and struggle (orange). The instance cut-01 is a verb frame to capture semantic similarity. We also lack that uses ARG0, ARG1 and inst to express the verb's a systematic evaluation benchmark for assessing agent (he), patient (apple), and instrument (knife), structural similarity between AMR graphs.
ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?
Huang, Fan, Kwak, Haewoon, Park, Kunwoo, An, Jisun
As AI becomes more integral in our lives, the need for transparency and responsibility grows. While natural language explanations (NLEs) are vital for clarifying the reasoning behind AI decisions, evaluating them through human judgments is complex and resource-intensive due to subjectivity and the need for fine-grained ratings. This study explores the alignment between ChatGPT and human assessments across multiple scales (i.e., binary, ternary, and 7-Likert scale). We sample 300 data instances from three NLE datasets and collect 900 human annotations for both informativeness and clarity scores as the text quality measurement. We further conduct paired comparison experiments under different ranges of subjectivity scores, where the baseline comes from 8,346 human annotations. Our results show that ChatGPT aligns better with humans in more coarse-grained scales. Also, paired comparisons and dynamic prompting (i.e., providing semantically similar examples in the prompt) improve the alignment. This research advances our understanding of large language models' capabilities to assess the text explanation quality in different configurations for responsible AI development.
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance
Aiyappa, Rachith, Senthilmani, Shruthi, An, Jisun, Kwak, Haewoon, Ahn, Yong-Yeol
Such fine-tuning Stance detection is a fundamental computational approaches can benefit from both the general language task that is widely used across many disciplines understanding from the pre-training as well such as political science and communication studies as the problem-specific thing, even without spending (Wang et al., 2019b; Küçük and Can, 2020) Its a huge amount of computing resources (Wang goal is to extract the standpoint or stance (e.g., Favor, et al., 2022a). Against, or Neutral) towards a target from a More recently, the GPT family of models (Radford given text. Given that modern democratic societies et al., 2019; Brown et al., 2020) birthed another make societal decisions by aggregating people's explicit powerful and even simpler paradigm of incontext stances through voting, estimation of peoples' learning ("few-shot" or "zero-shot"). Instead stances is a useful task. While a representative survey of tuning any parameters of the model, it is the gold standard, it falls short in scalability simply uses the input to guide the model to produce and cost (Salganik, 2019). Surveys can also produce the desired output for downstream tasks. For biased results due to the people's tendency to instance, a few examples related to the task can be report more socially acceptable positions even in fed as the context to the LLM.
Can we trust the evaluation on ChatGPT?
Aiyappa, Rachith, An, Jisun, Kwak, Haewoon, Ahn, Yong-Yeol
ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated remarkable performance in numerous natural language tasks. Despite its evident usefulness, evaluating ChatGPT's performance in diverse problem domains remains challenging due to the closed nature of the model and its continuous updates via Reinforcement Learning from Human Feedback (RLHF). We highlight the issue of data contamination in ChatGPT evaluations, with a case study of the task of stance detection. We discuss the challenge of preventing data contamination and ensuring fair model evaluation in the age of closed and continuously trained models.
Wearing Masks Implies Refuting Trump?: Towards Target-specific User Stance Prediction across Events in COVID-19 and US Election 2020
Zhang, Hong, Kwak, Haewoon, Gao, Wei, An, Jisun
People who share similar opinions towards controversial topics could form an echo chamber and may share similar political views toward other topics as well. The existence of such connections, which we call connected behavior, gives researchers a unique opportunity to predict how one would behave for a future event given their past behaviors. In this work, we propose a framework to conduct connected behavior analysis. Neural stance detection models are trained on Twitter data collected on three seemingly independent topics, i.e., wearing a mask, racial equality, and Trump, to detect people's stance, which we consider as their online behavior in each topic-related event. Our results reveal a strong connection between the stances toward the three topical events and demonstrate the power of past behaviors in predicting one's future behavior.
Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech
Huang, Fan, Kwak, Haewoon, An, Jisun
Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-written NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.
Chain of Explanation: New Prompting Method to Generate Higher Quality Natural Language Explanation for Implicit Hate Speech
Huang, Fan, Kwak, Haewoon, An, Jisun
The potential of sequence-to-sequence (Seq2Seq) models and prompting Recent studies have exploited advanced generative language models methods has not been fully explored [4]. Moreover, traditional evaluation to generate Natural Language Explanations (NLE) for why a certain metrics, such as BLEU [20] and Rouge [18], applied in NLE text could be hateful. We propose the Chain of Explanation (CoE) generation for hate speech, may also not be able to comprehensively Prompting method, using the heuristic words and target group, to capture the quality of the generated explanations because they generate high-quality NLE for implicit hate speech. We improved heavily rely on the word-level overlaps [3]. To fill those gaps, we the BLUE score from 44.0 to 62.3 for NLE generation by providing propose a Chain of Explanations (CoE) prompt method to generate accurate target information. We then evaluate the quality of generated high-quality NLE distinguishing the implicit hate speech from nonhateful NLE using various automatic metrics and human annotations tweets.
Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media
Salminen, Joni (Qatar Computing Research Institute, Hamad Bin Khalifa University) | Almerekhi, Hind (Hamad Bin Khalifa University) | Milenković, Milica (Independent Researcher) | Jung, Soon-gyo (Qatar Computing Research Institute, Hamad Bin Khalifa University) | An, Jisun (Qatar Computing Research Institute, Hamad Bin Khalifa University) | Kwak, Haewoon (Qatar Computing Research Institute, Hamad Bin Khalifa University) | Jansen, Bernard J. (Qatar Computing Research Institute, Hamad Bin Khalifa University)
Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.