AITopics | Derczynski, Leon

Collaborating Authors

Derczynski, Leon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sparse Probability of Agreement

Nørregaard, Jeppe, Derczynski, Leon

arXiv.org Artificial IntelligenceFeb-24-2023

Measuring inter-annotator agreement is important for annotation tasks, but many metrics require a fully-annotated set of data, where all annotators annotate all samples. We define Sparse Probability of Agreement, SPA, which estimates the probability of agreement when not all annotator-item-pairs are available. We show that under certain conditions, SPA is an unbiased estimator, and we provide multiple weighing schemes for handling data with various degrees of annotation.

annotation, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2208.06161

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Handling and Presenting Harmful Text in NLP Research

Kirk, Hannah Rose, Birhane, Abeba, Vidgen, Bertie, Derczynski, Leon

arXiv.org Artificial IntelligenceFeb-24-2023

Text data can pose a risk of harm. However, the risks are not fully understood, and how to handle, present, and discuss harmful text in a safe way remains an unresolved issue in the NLP community. We provide an analytical framework categorising harms on three axes: (1) the harm type (e.g., misinformation, hate speech or racial stereotypes); (2) whether a harm is \textit{sought} as a feature of the research design if explicitly studying harmful content (e.g., training a hate speech classifier), versus \textit{unsought} if harmful content is encountered when working on unrelated problems (e.g., language generation or part-of-speech tagging); and (3) who it affects, from people (mis)represented in the data to those handling the data and those publishing on the data. We provide advice for practitioners, with concrete steps for mitigating harm in research and in publication. To assist implementation we introduce \textsc{HarmCheck} -- a documentation standard for handling and presenting harmful text in research.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2204.14256

Country:

Europe (1.00)
North America > United States > New York (0.14)

Genre:

Research Report (0.64)
Summary/Review (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.34)

Add feedback

Directions in Abusive Language Training Data: Garbage In, Garbage Out

Vidgen, Bertie, Derczynski, Leon

arXiv.org Artificial IntelligenceJul-19-2021

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies. This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data. This collection of knowledge leads to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.

artificial intelligence, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

doi: 10.1371/journal.pone.0243300

2004.0167

Country:

North America > United States (0.92)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Media (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Law Enforcement & Public Safety (1.00)
(6 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Power Consumption Variation over Activation Functions

Derczynski, Leon

arXiv.org Machine LearningJun-12-2020

The power machine learning models consume when making predictions can be affected by a model's architecture. This paper presents various estimates of power consumption for a range of different activation functions, a core factor in neural network model architecture design. Substantial differences in hardware performance exist between activation functions. This difference informs how power consumption in machine learning models can be reduced. The field of deep neural networks has reported strong progress in many problem areas, including natural language processing (NLP), image recognition, and game playing.

activation function, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

2006.07237

Country: Europe > Denmark (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Helping Crisis Responders Find the Informative Needle in the Tweet Haystack

Derczynski, Leon, Meesters, Kenny, Bontcheva, Kalina, Maynard, Diana

arXiv.org Artificial IntelligenceJan-29-2018

Crisis responders are increasingly using social media, data and other digital sources of information to build a situational understanding of a crisis situation in order to design an effective response. However with the increased availability of such data, the challenge of identifying relevant information from it also increases. This paper presents a successful automatic approach to handling this problem. Messages are filtered for informativeness based on a definition of the concept drawn from prior research and crisis response experts. Informative messages are tagged for actionable data -- for example, people in need, threats to rescue efforts, changes in environment, and so on. In all, eight categories of actionability are identified. The two components -- informativeness and actionability classification -- are packaged together as an openly-available tool called Emina (Emergent Informativeness and Actionability).

information, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

1801.09633

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Generalised Brown Clustering and Roll-Up Feature Generation

Derczynski, Leon (University of Sheffield) | Chester, Sean (Norwegian University of Science and Technology)

AAAI ConferencesApr-19-2016

Brown clustering is an established technique, used in hundreds of computational linguistics papers each year, to group word types that have similar distributional information. It is unsupervised and can be used to create powerful word representations for machine learning. Despite its improbable success relative to more complex methods, few have investigated whether Brown clustering has really been applied optimally. In this paper, we present a subtle but profound generalisation of Brown clustering to improve the overall quality by decoupling the number of output classes from the computational active set size. Moreover, the generalisation permits a novel approach to feature selection from Brown clusters: We show that the standard approach of shearing the Brown clustering output tree at arbitrary bitlengths is lossy and that features should be chosen insead by rolling up Generalised Brown hierarchies. The generalisation and corresponding feature generation is more principled, challenging the way Brown clustering is currently understood and applied.

artificial intelligence, brown clustering, text processing, (19 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

Europe > United Kingdom (0.14)
Europe > Norway (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback