Goto

Collaborating Authors

 Derczynski, Leon


Handling and Presenting Harmful Text in NLP Research

arXiv.org Artificial Intelligence

Text data can pose a risk of harm. However, the risks are not fully understood, and how to handle, present, and discuss harmful text in a safe way remains an unresolved issue in the NLP community. We provide an analytical framework categorising harms on three axes: (1) the harm type (e.g., misinformation, hate speech or racial stereotypes); (2) whether a harm is \textit{sought} as a feature of the research design if explicitly studying harmful content (e.g., training a hate speech classifier), versus \textit{unsought} if harmful content is encountered when working on unrelated problems (e.g., language generation or part-of-speech tagging); and (3) who it affects, from people (mis)represented in the data to those handling the data and those publishing on the data. We provide advice for practitioners, with concrete steps for mitigating harm in research and in publication. To assist implementation we introduce \textsc{HarmCheck} -- a documentation standard for handling and presenting harmful text in research.


Sparse Probability of Agreement

arXiv.org Artificial Intelligence

Measuring inter-annotator agreement is important for annotation tasks, but many metrics require a fully-annotated set of data, where all annotators annotate all samples. We define Sparse Probability of Agreement, SPA, which estimates the probability of agreement when not all annotator-item-pairs are available. We show that under certain conditions, SPA is an unbiased estimator, and we provide multiple weighing schemes for handling data with various degrees of annotation.


Directions in Abusive Language Training Data: Garbage In, Garbage Out

arXiv.org Artificial Intelligence

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies. This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data. This collection of knowledge leads to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.


Power Consumption Variation over Activation Functions

arXiv.org Machine Learning

The power machine learning models consume when making predictions can be affected by a model's architecture. This paper presents various estimates of power consumption for a range of different activation functions, a core factor in neural network model architecture design. Substantial differences in hardware performance exist between activation functions. This difference informs how power consumption in machine learning models can be reduced. The field of deep neural networks has reported strong progress in many problem areas, including natural language processing (NLP), image recognition, and game playing.


Helping Crisis Responders Find the Informative Needle in the Tweet Haystack

arXiv.org Artificial Intelligence

Crisis responders are increasingly using social media, data and other digital sources of information to build a situational understanding of a crisis situation in order to design an effective response. However with the increased availability of such data, the challenge of identifying relevant information from it also increases. This paper presents a successful automatic approach to handling this problem. Messages are filtered for informativeness based on a definition of the concept drawn from prior research and crisis response experts. Informative messages are tagged for actionable data -- for example, people in need, threats to rescue efforts, changes in environment, and so on. In all, eight categories of actionability are identified. The two components -- informativeness and actionability classification -- are packaged together as an openly-available tool called Emina (Emergent Informativeness and Actionability).


Generalised Brown Clustering and Roll-Up Feature Generation

AAAI Conferences

Brown clustering is an established technique, used in hundreds of computational linguistics papers each year, to group word types that have similar distributional information. It is unsupervised and can be used to create powerful word representations for machine learning. Despite its improbable success relative to more complex methods, few have investigated whether Brown clustering has really been applied optimally. In this paper, we present a subtle but profound generalisation of Brown clustering to improve the overall quality by decoupling the number of output classes from the computational active set size. Moreover, the generalisation permits a novel approach to feature selection from Brown clusters: We show that the standard approach of shearing the Brown clustering output tree at arbitrary bitlengths is lossy and that features should be chosen insead by rolling up Generalised Brown hierarchies. The generalisation and corresponding feature generation is more principled, challenging the way Brown clustering is currently understood and applied.