Strzalkowski, Tomek
Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling
Katsios, Gregorios A, Sa, Ning, Strzalkowski, Tomek
The identification of Figurative Language (FL) features in text is crucial for various Natural Language Processing (NLP) tasks, where understanding of the author's intended meaning and its nuances is key for successful communication. At the same time, the use of a specific blend of various FL forms most accurately reflects a writer's style, rather than the use of any single construct, such as just metaphors or irony. Thus, we postulate that FL features could play an important role in Authorship Attribution (AA) tasks. We believe that our is the first computational study of AA based on FL use. Accordingly, we propose a Multi-task Figurative Language Model (MFLM) that learns to detect multiple FL features in text at once. We demonstrate, through detailed evaluation across multiple test sets, that the our model tends to perform equally or outperform specialized binary models in FL detection. Subsequently, we evaluate the predictive capability of joint FL features towards the AA task on three datasets, observing improved AA performance through the integration of MFLM embeddings.
Uncovering Agendas: A Novel French & English Dataset for Agenda Detection on Social Media
Katsios, Gregorios, Sa, Ning, Bhaumik, Ankita, Strzalkowski, Tomek
The behavior and decision making of groups or communities can be dramatically influenced by individuals pushing particular agendas, e.g., to promote or disparage a person or an activity, to call for action, etc.. In the examination of online influence campaigns, particularly those related to important political and social events, scholars often concentrate on identifying the sources responsible for setting and controlling the agenda (e.g., public media). In this article we present a methodology for detecting specific instances of agenda control through social media where annotated data is limited or non-existent. By using a modest corpus of Twitter messages centered on the 2022 French Presidential Elections, we carry out a comprehensive evaluation of various approaches and techniques that can be applied to this problem. Our findings demonstrate that by treating the task as a textual entailment problem, it is possible to overcome the requirement for a large annotated training dataset.
Social Convos: Capturing Agendas and Emotions on Social Media
Bhaumik, Ankita, Sa, Ning, Katsios, Gregorios, Strzalkowski, Tomek
Social media platforms are popular tools for disseminating targeted information during major public events like elections or pandemics. Systematic analysis of the message traffic can provide valuable insights into prevailing opinions and social dynamics among different segments of the population. We are specifically interested in influence spread, and in particular whether more deliberate influence operations can be detected. However, filtering out the essential messages with telltale influence indicators from the extensive and often chaotic social media traffic is a major challenge. In this paper we present a novel approach to extract influence indicators from messages circulating among groups of users discussing particular topics. We build upon the concept of a convo to identify influential authors who are actively promoting some particular agenda around that topic within the group. We focus on two influence indicators: the (control of) agenda and the use of emotional language.
Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework
Pisano, Matthew, Ly, Peter, Sanders, Abraham, Yao, Bingsheng, Wang, Dakuo, Strzalkowski, Tomek, Si, Mei
Modern Large language models (LLMs) can still generate responses that may not be aligned with human expectations or values. While many weight-based alignment methods have been proposed, many of them still leave models vulnerable to attacks when used on their own. To help mitigate this issue, we introduce Bergeron, a framework designed to improve the robustness of LLMs against adversarial attacks. Bergeron employs a two-tiered architecture. Here, a secondary LLM serves as a simulated conscience that safeguards a primary LLM. We do this by monitoring for and correcting potentially harmful text within both the prompt inputs and the generated outputs of the primary LLM. Empirical evaluation shows that Bergeron can improve the alignment and robustness of several popular LLMs without costly fine-tuning. It aids both open-source and black-box LLMs by complementing and reinforcing their existing alignment training.
Modeling Leadership Behavior of Players in Virtual Worlds
Shaikh, Samira (State University of New York at Albany) | Strzalkowski, Tomek (State University of New York at Albany) | Stromer-Galley, Jennifer (Syracuse University) | Broadwell, George Aaron (State University of New York at Albany) | Liu, Ting (State University of New York at Albany) | Martey, Rosa Mikeal (Colorado State University)
In this article, we describe our method of modeling sociolinguistic behaviors of players in massively multi-player online games. The focus of this paper is leadership, as it is manifested by the participants engaged in discussion, and the automated modeling of this complex behavior in virtual worlds. We first approach the research question of modeling from a social science perspective, and ground our models in theories from human communication literature. We then adapt a two-tiered algorithmic model that derives certain mid-level sociolinguistic behaviors--such as Task Control, Topic Control and Disagreement from discourse linguistic indicators--and combines these in a weighted model to reveal the complex role of Leadership. The algorithm is evaluated by comparing its prediction of leaders against ground truth – the participants’ own ratings of leadership of themselves and their conversation peers. We find the algorithm performance to be considerably better than baseline.
Hedge Detection Using a Rewards and Penalties Approach
Stahl, Ken (State University of New York - University at Albany) | Shaikh, Samira (State University of New York - University at Albany) | Strzalkowski, Tomek (State University of New York - University at Albany)
Semantic and syntactic features found in text can be used in combination to statistically predict linguistic devices such as hedges in online chat. Some features are better indicators than others, and there are cases when multiple features need to be considered together to be useful. Once the features are identified, it becomes an optimization problem to find the best division of data. We have devised a genetic algorithm approach towards detecting hedges in online multi-party chat discourse. A system was created using rewards and penalties for matching features in tokenized text, so optimizing the reward and penalty amounts are the main challenge. Genetic algorithms, a subset of Evolutionary Algorithms, are great for optimization; as they are massively parallel directed searches, and therefore suited to finding the best ratio of integer rewards and penalties. “Evolutionary algorithms (EAs) utilize principles of natural selection and are robust adaptive search schemes suitable for searching nonlinear, discontinuous, and high-dimensional spaces. This class of algorithms is being increasingly applied to obtain optimal or near-optimal solutions to many complex real-world optimization problems” (Bonissone, et. al. 2006) We show results using 10-fold cross validation as commonly used in traditional machine learning. The best performance without further fine tuning is 79% in classifying whether an utterance in chat contains a hedge or not.
Modeling Socio-Cultural Phenomena in Online Multi-Party Discourse
Strzalkowski, Tomek (State University of New York - Albany and Polish Academy of Sciences) | Broadwell, George Aaron (State University of New York - Albany) | Stromer-Galley, Jennifer ( State University of New York - Albany ) | Shaikh, Samira (State University of New York - Albany) | Liu, Ting (State University of New York - Albany) | Taylor, Sarah (Lockheed Martin)
We present in this paper, the application of a novel approach to computational modeling, understanding and detection of social phenomena in online multi-party discourse. A two-tiered approach was developed to detect a collection of social phenomena deployed by participants, such as topic control, task control, disagreement and involvement. We discuss how the mid-level social phenomena can be reliably detected in discourse and these measures can be used to differentiate participants of online discourse. Our approach works across different types of online chat and we show results on two specific data sets.