AITopics | Fathullah, Yassir

Collaborating Authors

Fathullah, Yassir

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

Fathullah, Yassir, Radmard, Puria, Liusie, Adian, Gales, Mark J. F.

arXiv.org Artificial IntelligenceMay-8-2023

State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.05098

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Self-Distribution Distillation: Efficient Uncertainty Estimation

Fathullah, Yassir, Gales, Mark J. F.

arXiv.org Machine LearningMar-15-2022

Deep learning is increasingly being applied in safety-critical domains. For these scenarios it is important to know the level of uncertainty in a model's prediction to ensure appropriate decisions are made by the system. Deep ensembles are the de-facto standard approach to obtaining various measures of uncertainty. However, ensembles often significantly increase the resources required in the training and/or deployment phases. Approaches have been developed that typically address the costs in one of these phases. In this work we propose a novel training approach, self-distribution distillation (S2D), which is able to efficiently train a single model that can estimate uncertainties. Furthermore it is possible to build ensembles of these models and apply hierarchical ensemble distillation approaches. Experiments on CIFAR-100 showed that S2D models outperformed standard models and Monte-Carlo dropout. Additional out-of-distribution detection experiments on LSUN, Tiny ImageNet, SVHN showed that even a standard deep ensemble can be outperformed using S2D based ensembles and novel distilled models.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2203.08295

Country: North America (0.46)

Genre: Research Report (0.81)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Ensemble Distillation Approaches for Grammatical Error Correction

Fathullah, Yassir, Gales, Mark, Malinin, Andrey

arXiv.org Artificial IntelligenceDec-15-2020

Ensemble approaches are commonly used techniques to improving a system by combining multiple model predictions. Additionally these schemes allow the uncertainty, as well as the source of the uncertainty, to be derived for the prediction. Unfortunately these benefits come at a computational and memory cost. To address this problem ensemble distillation (EnD) and more recently ensemble distribution distillation (EnDD) have been proposed that compress the ensemble into a single model, representing either the ensemble average prediction or prediction distribution respectively. This paper examines the application of both these distillation approaches to a sequence prediction task, grammatical error correction (GEC). This is an important application area for language learning tasks as it can yield highly useful feedback to the learner. It is, however, more challenging than the standard tasks investigated for distillation as the prediction of any grammatical correction to a word will be highly dependent on both the input sequence and the generated output history for the word. The performance of both EnD and EnDD are evaluated on both publicly available GEC tasks as well as a spoken language task.

distillation, machine translation, neural network, (19 more...)

arXiv.org Artificial Intelligence

2012.07535

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.62)
Information Technology > Data Science > Data Quality > Data Cleaning (0.62)

Add feedback