upvote
Testing Hypotheses from the Social Approval Theory of Online Hate: An Analysis of 110 Million Messages from Parler
Markowitz, David M., Taylor, Samuel Hardman
We examined how online hate is motivated by receiving social approval via Walther's (2024) social approval theory of online hate, which argues (H1a) more signals of social approval on hate messages predicts more subsequent hate messages, and (H1b) as social approval increases, hate speech becomes more extreme. Using 110 million messages from Parler (2018-2021), we observed the number of upvotes received on a hate speech post was unassociated with hate speech in one's next post and during the next month, three-months, and six-months. The number of upvotes received on (extreme) hate speech comments, however, was positively associated with (extreme) hate speech during the next week, month, three-months, and six-months. Between-person effects revealed an average positive relationship between social approval and hate speech production at all time intervals. For comments, social approval linked more strongly to online hate than social disapproval. Social approval is a critical mechanism facilitating online hate propagation.
- Asia > China (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Michigan > Ingham County > Lansing (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (0.93)
LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing
Fein, Daniel, Russo, Sebastian, Xiang, Violet, Jolly, Kabir, Rafailov, Rafael, Haber, Nick
Evaluating creative writing generated by large language models (LLMs) remains challenging because open-ended narratives lack ground truths. Without performant automated evaluation methods, off-the-shelf (OTS) language models are employed as zero-shot judges, yet their reliability is unclear in this context. In pursuit of robust evaluation for creative writing, we introduce LitBench, the first standardized benchmark and paired dataset for creative writing verification, comprising a held-out test set of 2,480 debiased, human-labeled story comparisons drawn from Reddit and a 43,827-pair training corpus of human preference labels. Using LitBench, we (i) benchmark zero-shot LLM judges, (ii) train Bradley Terry and generative reward models, and (iii) conduct an online human study to validate reward model rankings on newly LLM-generated stories. Our benchmark identifies Claude-3.7-Sonnet as the strongest off-the-shelf judge, reaching 73% agreement with human preferences; among trained reward models, Bradley-Terry and Generative reward models both attain an accuracy of 78%, outperforming all off-the-shelf judges. An online human study further confirms that our trained reward models consistently align with human preferences in novel LLM-generated stories. We release LitBench and reward models at https://huggingface.co/collections/SAA-Lab/litbench-68267b5da3aafe58f9e43461, providing a vetted resource for reliable, automated evaluation and optimization of creative writing systems.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- North America > United States > Michigan (0.04)
- (4 more...)
YT-30M: A multi-lingual multi-category dataset of YouTube comments
This paper introduces two large-scale multilingual comment datasets, YT-30M (and YT-100K) from YouTube. The analysis in this paper is performed on a smaller sample (YT-100K) of YT-30M. Both the datasets: YT-30M (full) and YT-100K (randomly selected 100K sample from YT-30M) are publicly released for further research. YT-30M (YT-100K) contains 32236173 (108694) comments posted by YouTube channel that belong to YouTube categories. Each comment is associated with a video ID, comment ID, commentor name, commentor channel ID, comment text, upvotes, original channel ID and category of the YouTube channel (e.g., 'News & Politics', 'Science & Technology', etc.).
The News Comment Gap and Algorithmic Agenda Setting in Online Forums
Böwing, Flora, Gildersleve, Patrick
The disparity between news stories valued by journalists and those preferred by readers, known as the "News Gap", is well-documented. However, the difference in expectations regarding news related user-generated content is less studied. Comment sections, hosted by news websites, are popular venues for reader engagement, yet still subject to editorial decisions. It is thus important to understand journalist vs reader comment preferences and how these are served by various comment ranking algorithms that represent discussions differently. We analyse 1.2 million comments from Austrian newspaper Der Standard to understand the "News Comment Gap" and the effects of different ranking algorithms. We find that journalists prefer positive, timely, complex, direct responses, while readers favour comments similar to article content from elite authors. We introduce the versatile Feature-Oriented Ranking Utility Metric (FORUM) to assess the impact of different ranking algorithms and find dramatic differences in how they prioritise the display of comments by sentiment, topical relevance, lexical diversity, and readability. Journalists can exert substantial influence over the discourse through both curatorial and algorithmic means. Understanding these choices' implications is vital in fostering engaging and civil discussions while aligning with journalistic objectives, especially given the increasing legal scrutiny and societal importance of online discourse.
- Europe > United Kingdom (0.28)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Predicting the Understandability of Computational Notebooks through Code Metrics Analysis
Ghahfarokhi, Mojtaba Mostafavi, Asadi, Alireza, Asgari, Arash, Mohammadi, Bardia, Rizi, Masih Beigi, Heydarnoori, Abbas
Computational notebooks have become the primary coding environment for data scientists. However, research on their code quality is still emerging, and the code shared is often of poor quality. Given the importance of maintenance and reusability, understanding the metrics that affect notebook code comprehensibility is crucial. Code understandability, a qualitative variable, is closely tied to user opinions. Traditional approaches to measuring it either use limited questionnaires to review a few code pieces or rely on metadata such as likes and votes in software repositories. Our approach enhances the measurement of Jupyter notebook understandability by leveraging user comments related to code understandability. As a case study, we used 542,051 Kaggle Jupyter notebooks from our previous research, named DistilKaggle. We employed a fine-tuned DistilBERT transformer to identify user comments associated with code understandability. We established a criterion called User Opinion Code Understandability (UOCU), which considers the number of relevant comments, upvotes on those comments, total notebook views, and total notebook upvotes. UOCU proved to be more effective than previous methods. Furthermore, we trained machine learning models to predict notebook code understandability based solely on their metrics. We collected 34 metrics for 132,723 final notebooks as features in our dataset, using UOCU as the label. Our predictive model, using the Random Forest classifier, achieved 89% accuracy in predicting the understandability levels of computational notebooks.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
Axiomatic Preference Modeling for Longform Question Answering
Rosset, Corby, Zheng, Guoqing, Dibia, Victor, Awadallah, Ahmed, Bennett, Paul
The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model. However, these reward models (RMs) often lack direct knowledge of why, or under what principles, the preferences annotations were made. In this study, we identify principles that guide RMs to better align with human preferences, and then develop an axiomatic framework to generate a rich variety of preference signals to uphold them. We use these axiomatic signals to train a model for scoring answers to longform questions. Our approach yields a Preference Model with only about 220M parameters that agrees with gold human-annotated preference labels more often than GPT-4. The contributions of this work include: training a standalone preference model that can score human- and LLM-generated answers on the same scale; developing an axiomatic framework for generating training data pairs tailored to certain principles; and showing that a small amount of axiomatic signals can help small models outperform GPT-4 in preference scoring. We release our model on huggingface: https://huggingface.co/corbyrosset/axiomatic_preference_model
- North America > United States > New York > New York County > New York City (0.04)
- South America (0.04)
- North America > Central America (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Media (0.67)
- Education (0.67)
How the Supreme Court ruling on Section 230 could end Reddit as we know it
But another big issue is at stake that has received much less attention: depending on the outcome of the case, individual users of sites may suddenly be liable for run-of-the-mill content moderation. Many sites rely on users for community moderation to edit, shape, remove, and promote other users' content online--think Reddit's upvote, or changes to a Wikipedia page. What might happen if those users were forced to take on legal risk every time they made a content decision? In short, the court could change Section 230 in ways that won't just impact big platforms; smaller sites like Reddit and Wikipedia that rely on community moderation will be hit too, warns Emma Llansó, director of the Center for Democracy and Technology's Free Expression Project. "It would be an enormous loss to online speech communities if suddenly it got really risky for mods themselves to do their work," she says.
Sentiment Analysis for Measuring Hope and Fear from Reddit Posts During the 2022 Russo-Ukrainian Conflict
Guerra, Alessio, Karakuş, Oktay
This paper proposes a novel lexicon-based unsupervised sentimental analysis method to measure the $``\textit{hope}"$ and $``\textit{fear}"$ for the 2022 Ukrainian-Russian Conflict. $\textit{Reddit.com}$ is utilised as the main source of human reactions to daily events during nearly the first three months of the conflict. The top 50 $``hot"$ posts of six different subreddits about Ukraine and news (Ukraine, worldnews, Ukraina, UkrainianConflict, UkraineWarVideoReport, UkraineWarReports) and their relative comments are scraped and a data set is created. On this corpus, multiple analyses such as (1) public interest, (2) hope/fear score, (3) stock price interaction are employed. We promote using a dictionary approach, which scores the hopefulness of every submitted user post. The Latent Dirichlet Allocation (LDA) algorithm of topic modelling is also utilised to understand the main issues raised by users and what are the key talking points. Experimental analysis shows that the hope strongly decreases after the symbolic and strategic losses of Azovstal (Mariupol) and Severodonetsk. Spikes in hope/fear, both positives and negatives, are present after important battles, but also some non-military events, such as Eurovision and football games.
- Media > News (1.00)
- Leisure & Entertainment (1.00)
- Government > Military (1.00)
- (4 more...)
Cross-Domain Consumer Review Analysis
The paper presents a cross-domain review analysis on four popular review datasets: Amazon, Yelp, Steam, IMDb. The analysis is performed using Hadoop and Spark, which allows for efficient and scalable processing of large datasets. By examining close to 12 million reviews from these four online forums, we hope to uncover interesting trends in sales and customer sentiment over the years. Our analysis will include a study of the number of reviews and their distribution over time, as well as an examination of the relationship between various review attributes such as upvotes, creation time, rating, and sentiment. By comparing the reviews across different domains, we hope to gain insight into the factors that drive customer satisfaction and engagement in different product categories.
- Information Technology > Artificial Intelligence (0.99)
- Information Technology > Data Science > Data Mining > Big Data (0.51)
The Naughtyformer: A Transformer Understands Offensive Humor
Tang, Leonard, Cai, Alexander, Li, Steve, Wang, Jason
Jokes are intentionally written to be funny, but not all jokes are created the same. Some jokes may be fit for a classroom of kindergarteners, but others are best reserved for a more mature audience. While recent work has shown impressive results on humor detection in text, here we instead investigate the more nuanced task of detecting humor subtypes, especially of the less innocent variety. To that end, we introduce a novel jokes dataset filtered from Reddit and solve the subtype classification task using a finetuned Transformer dubbed the Naughtyformer. Moreover, we show that our model is significantly better at detecting offensiveness in jokes compared to state-of-the-art methods.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)