AITopics | Maynard, Diana

Collaborating Authors

Maynard, Diana

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hostility Detection in UK Politics: A Dataset on Online Abuse Targeting MPs

Pandya, Mugdha, Jin, Mali, Bontcheva, Kalina, Maynard, Diana

arXiv.org Artificial IntelligenceDec-5-2024

Numerous politicians use social media platforms, particularly X, to engage with their constituents. This interaction allows constituents to pose questions and offer feedback but also exposes politicians to a barrage of hostile responses, especially given the anonymity afforded by social media. They are typically targeted in relation to their governmental role, but the comments also tend to attack their personal identity. This can discredit politicians and reduce public trust in the government. It can also incite anger and disrespect, leading to offline harm and violence. While numerous models exist for detecting hostility in general, they lack the specificity required for political contexts. Furthermore, addressing hostility towards politicians demands tailored approaches due to the distinct language and issues inherent to each country (e.g., Brexit for the UK). To bridge this gap, we construct a dataset of 3,320 English tweets spanning a two-year period manually annotated for hostility towards UK MPs. Our dataset also captures the targeted identity characteristics (race, gender, religion, none) in hostile tweets. We perform linguistic and topical analyses to delve into the unique content of the UK political data. Finally, we evaluate the performance of pre-trained language models and large language models on binary hostility detection and multi-class targeted identity type classification tasks. Our study offers valuable data and insights for future research on the prevalence and nature of politics-related hostility specific to the UK.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.04046

Country:

Europe > United Kingdom (1.00)
North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploring the Influence of Label Aggregation on Minority Voices: Implications for Dataset Bias and Model Training

Pandya, Mugdha, Moosavi, Nafise Sadat, Maynard, Diana

arXiv.org Artificial IntelligenceDec-5-2024

Resolving disagreement in manual annotation typically consists of removing unreliable annotators and using a label aggregation strategy such as majority vote or expert opinion to resolve disagreement. These may have the side-effect of silencing or under-representing minority but equally valid opinions. In this paper, we study the impact of standard label aggregation strategies on minority opinion representation in sexism detection. We investigate the quality and value of minority annotations, and then examine their effect on the class distributions in gold labels, as well as how this affects the behaviour of models trained on the resulting datasets. Finally, we discuss the potential biases introduced by each method and how they can be amplified by the models.

aggregation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.04025

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference

Thorne, William, Robinson, Ambrose, Peng, Bohua, Lin, Chenghua, Maynard, Diana

arXiv.org Artificial IntelligenceOct-10-2024

As the cultural heritage sector increasingly adopts technologies like Retrieval-Augmented Generation (RAG) to provide more personalised search experiences and enable conversations with collections data, the demand for specialised evaluation datasets has grown. While end-to-end system testing is essential, it's equally important to assess individual components. We target the final, answering task, which is well-suited to Machine Reading Comprehension (MRC). Although existing MRC datasets address general domains, they lack the specificity needed for cultural heritage information. Unfortunately, the manual creation of such datasets is prohibitively expensive for most heritage institutions. This paper presents a cost-effective approach for generating domain-specific MRC datasets with increased difficulty using Reinforcement Learning from Human Feedback (RLHF) from synthetic preference data. Our method leverages the performance of existing question-answering models on a subset of SQuAD to create a difficulty metric, assuming that more challenging questions are answered correctly less frequently. This research contributes: (1) A methodology for increasing question difficulty using PPO and synthetic data; (2) Empirical evidence of the method's effectiveness, including human evaluation; (3) An in-depth error analysis and study of emergent phenomena; and (4) An open-source codebase and set of three llama-2-chat adapters for reproducibility and adaptation.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.08289

Country:

North America > United States (0.68)
Asia (0.68)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

Jiang, Ye, Wang, Taihang, Xu, Xiaoman, Wang, Yimin, Song, Xingyi, Maynard, Diana

arXiv.org Artificial IntelligenceJul-16-2024

The nascent topic of fake news requires automatic detection methods to quickly learn from limited annotated samples. Therefore, the capacity to rapidly acquire proficiency in a new task with limited guidance, also known as few-shot learning, is critical for detecting fake news in its early stages. Existing approaches either involve fine-tuning pre-trained language models which come with a large number of parameters, or training a complex neural network from scratch with large-scale annotated datasets. This paper presents a multimodal fake news detection model which augments multimodal features using unimodal features. For this purpose, we introduce Cross-Modal Augmentation (CMA), a simple approach for enhancing few-shot multimodal fake news detection by transforming n-shot classification into a more robust (n $\times$ z)-shot problem, where z represents the number of supplementary features. The proposed CMA achieves SOTA results over three benchmark datasets, utilizing a surprisingly simple linear probing method to classify multimodal fake news with only a few training samples. Furthermore, our method is significantly more lightweight than prior approaches, particularly in terms of the number of trainable parameters and epoch times. The code is available here: \url{https://github.com/zgjiangtoby/FND_fewshot}

machine learning, natural language, new detection, (19 more...)

arXiv.org Artificial Intelligence

2407.1288

Country:

North America (0.46)
Asia > China > Shandong Province (0.14)

Genre: Research Report (0.82)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Dimensions of Online Conflict: Towards Modeling Agonism

Canute, Matt, Jin, Mali, holtzclaw, hannah, Lusoli, Alberto, Adams, Philippa R, Pandya, Mugdha, Taboada, Maite, Maynard, Diana, Chun, Wendy Hui Kyong

arXiv.org Artificial IntelligenceNov-6-2023

Agonism plays a vital role in democratic dialogue by fostering diverse perspectives and robust discussions. Within the realm of online conflict there is another type: hateful antagonism, which undermines constructive dialogue. Detecting conflict online is central to platform moderation and monetization. It is also vital for democratic dialogue, but only when it takes the form of agonism. To model these two types of conflict, we collected Twitter conversations related to trending controversial topics. We introduce a comprehensive annotation schema for labelling different dimensions of conflict in the conversations, such as the source of conflict, the target, and the rhetorical strategies deployed. Using this schema, we annotated approximately 4,000 conversations with multiple labels. We then trained both logistic regression and transformer-based models on the dataset, incorporating context from the conversation, including the number of participants and the structure of the interactions. Results show that contextual labels are helpful in identifying conflict and make the models robust to variations in topic. Our research contributes a conceptualization of different dimensions of conflict, a richly annotated dataset, and promising results that can contribute to content moderation.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.03584

Country:

Europe (1.00)
Asia (0.94)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government (0.93)
Information Technology (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Examining Temporal Bias in Abusive Language Detection

Jin, Mali, Mu, Yida, Maynard, Diana, Bontcheva, Kalina

arXiv.org Artificial IntelligenceSep-25-2023

Previous work identified temporal bias in an Italian hate In recent years, researchers have developed a huge variety speech data set associated with immigrants (Florio et al. of machine learning models that can automatically detect 2020). However, they have yet to explore temporal factors abusive language (Mishra et al. 2019; Aurpa, Sadik, and affecting predictive performance from a multilingual perspective. Ahmed 2022; Das and Mukherjee 2023; Alrashidi, Jamal, In this paper, we explore temporal bias in 5 different and Alkhathlan 2023). However, these models may be subject abusive data sets that span varying time periods, in 4 to temporal bias, which can lead to a decrease in the languages (English, Spanish, Italian, and Chinese). Specifically, accuracy of abusive language detection models, potentially we investigate the following core research questions: allowing abusive language to be undetected or falsely detected. RQ1: How does the magnitude of temporal bias vary across different data sets such as language, time span and Temporal bias arises from differences in populations and collection methods?

chronological split, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.14146

Country:

North America (0.28)
Europe (0.28)
Asia > Middle East > Palestine (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.88)
Law > Civil Rights & Constitutional Law (0.68)
Government > Regional Government (0.66)
Government > Immigration & Customs (0.54)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Similarity-Aware Multimodal Prompt Learning for Fake News Detection

Jiang, Ye, Yu, Xiaomin, Wang, Yimin, Xu, Xiaoman, Song, Xingyi, Maynard, Diana

arXiv.org Artificial IntelligenceJun-16-2023

The standard paradigm for fake news detection mainly utilizes text information to model the truthfulness of news. However, the discourse of online fake news is typically subtle and it requires expert knowledge to use textual information to debunk fake news. Recently, studies focusing on multimodal fake news detection have outperformed text-only methods. Recent approaches utilizing the pre-trained model to extract unimodal features, or fine-tuning the pre-trained model directly, have become a new paradigm for detecting fake news. Again, this paradigm either requires a large number of training instances, or updates the entire set of pre-trained model parameters, making real-world fake news detection impractical. Furthermore, traditional multimodal methods fuse the cross-modal features directly without considering that the uncorrelated semantic representation might inject noise into the multimodal features. This paper proposes a Similarity-Aware Multimodal Prompt Learning (SAMPLE) framework. First, we incorporate prompt learning into multimodal fake news detection. Prompt learning, which only tunes prompts with a frozen language model, can reduce memory usage significantly and achieve comparable performances, compared with fine-tuning. We analyse three prompt templates with a soft verbalizer to detect fake news. In addition, we introduce the similarity-aware fusing method to adaptively fuse the intensity of multimodal representation and mitigate the noise injection via uncorrelated cross-modal features. For evaluation, SAMPLE surpasses the F1 and the accuracies of previous works on two benchmark multimodal datasets, demonstrating the effectiveness of the proposed method in detecting fake news. In addition, SAMPLE also is superior to other approaches regardless of few-shot and data-rich settings.

artificial intelligence, natural language, similarity-aware multimodal prompt learning, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.ins.2023.119446

2304.04187

Genre: Research Report (0.40)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

Add feedback

Classification Aware Neural Topic Model and its Application on a New COVID-19 Disinformation Corpus

Song, Xingyi, Petrak, Johann, Jiang, Ye, Singh, Iknoor, Maynard, Diana, Bontcheva, Kalina

arXiv.org Machine LearningJun-5-2020

The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide. To help tackle this, we developed computational methods to support COVID-19 disinformation debunking and social impacts research. This paper presents: 1) the currently largest available manually annotated COVID-19 disinformation category dataset; and 2) a classification-aware neural topic model (CANTM) that combines classification and topic modelling under a variational autoencoder framework. We demonstrate that CANTM efficiently improves classification performance with low resources, and is scalable. In addition, the classification-aware topics help researchers and end-users to better understand the classification results.

disinformation, immunology, neural network, (21 more...)

arXiv.org Machine Learning

2006.03354

Country:

North America > United States (0.67)
Asia > China (0.48)
Asia > India (0.46)
(3 more...)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.86)

Add feedback

Helping Crisis Responders Find the Informative Needle in the Tweet Haystack

Derczynski, Leon, Meesters, Kenny, Bontcheva, Kalina, Maynard, Diana

arXiv.org Artificial IntelligenceJan-29-2018

Crisis responders are increasingly using social media, data and other digital sources of information to build a situational understanding of a crisis situation in order to design an effective response. However with the increased availability of such data, the challenge of identifying relevant information from it also increases. This paper presents a successful automatic approach to handling this problem. Messages are filtered for informativeness based on a definition of the concept drawn from prior research and crisis response experts. Informative messages are tagged for actionable data -- for example, people in need, threats to rescue efforts, changes in environment, and so on. In all, eight categories of actionability are identified. The two components -- informativeness and actionability classification -- are packaged together as an openly-available tool called Emina (Emergent Informativeness and Actionability).

information, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

1801.09633

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback