AITopics | transformerxl

Collaborating Authors

transformerxl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Yogarajan, Vithya, Pfahringer, Bernhard, Smith, Tony, Montiel, Jacob

arXiv.org Artificial IntelligenceDec-3-2021

Multi-label learning predicts a subset of labels from a given label set for an unseen instance while considering label correlations. A known challenge with multi-label classification is the long-tailed distribution of labels. Many studies focus on improving the overall predictions of the model and thus do not prioritise tail-end labels. Improving the tail-end label predictions in multi-label classifications of medical text enables the potential to understand patients better and improve care. The knowledge gained by one or more infrequent labels can impact the cause of medical decisions and treatment plans. This research presents variations of concatenated domain-specific language models, including multi-BioMed-Transformers, to achieve two primary goals. First, to improve F1 scores of infrequent labels across multi-label problems, especially with long-tail labels; second, to handle long medical text and multi-sourced electronic health records (EHRs), a challenging task for standard transformers designed to work on short input sequences. A vital contribution of this research is new state-of-the-art (SOTA) results obtained using TransformerXL for predicting medical codes. A variety of experiments are performed on the Medical Information Mart for Intensive Care (MIMIC-III) database. Results show that concatenated BioMed-Transformers outperform standard transformers in terms of overall micro and macro F1 scores and individual F1 scores of tail-end labels, while incurring lower training times than existing transformer-based solutions for long input sequences.

language model, tail-end label, transformerxl, (16 more...)

arXiv.org Artificial Intelligence

2112.01718

Country:

Oceania > New Zealand > North Island > Waikato (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.79)

Add feedback

Predicting COVID-19 Patient Shielding: A Comprehensive Study

Yogarajan, Vithya, Montiel, Jacob, Smith, Tony, Pfahringer, Bernhard

arXiv.org Artificial IntelligenceSep-30-2021

There are many ways machine learning and big data analytics are used in the fight against the COVID-19 pandemic, including predictions, risk management, diagnostics, and prevention. This study focuses on predicting COVID-19 patient shielding -- identifying and protecting patients who are clinically extremely vulnerable from coronavirus. This study focuses on techniques used for the multi-label classification of medical text. Using the information published by the United Kingdom NHS and the World Health Organisation, we present a novel approach to predicting COVID-19 patient shielding as a multi-label classification problem. We use publicly available, de-identified ICU medical text data for our experiments. The labels are derived from the published COVID-19 patient shielding data. We present an extensive comparison across 12 multi-label classifiers from the simple binary relevance to neural networks and the most recent transformers. To the best of our knowledge this is the first comprehensive study, where such a range of multi-label classifiers for medical text are considered. We highlight the benefits of various approaches, and argue that, for the task at hand, both predictive accuracy and processing time are essential.

classifier, covid-19 patient, neural network, (14 more...)

arXiv.org Artificial Intelligence

2110.00183

Country:

Europe > United Kingdom (0.54)
Oceania > New Zealand > North Island > Waikato (0.04)
Europe > Hungary (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.49)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The Grammar-Learning Trajectories of Neural Language Models

Choshen, Leshem, Hacohen, Guy, Weinshall, Daphna, Abend, Omri

arXiv.org Artificial IntelligenceSep-13-2021

The learning trajectories of linguistic phenomena provide insight into the nature of linguistic representation, beyond what can be gleaned from inspecting the behavior of an adult speaker. To apply a similar approach to analyze neural language models (NLM), it is first necessary to establish that different models are similar enough in the generalizations they make. In this paper, we show that NLMs with different initialization, architecture, and training data acquire linguistic phenomena in a similar order, despite having different end performances over the data. Leveraging these findings, we compare the relative performance on different phenomena at varying learning stages with simpler reference models. Results suggest that NLMs exhibit consistent ``developmental'' stages. Initial analysis of these stages presents phenomena clusters (notably morphological ones), whose performance progresses in unison, suggesting potential links between their acquired representations.

computational linguistic, correlation, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2109.06096

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback

Compressive Transformers for Long-Range Sequence Modelling

Rae, Jack W., Potapenko, Anna, Jayakumar, Siddhant M., Lillicrap, Timothy P.

arXiv.org Machine LearningNov-13-2019

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Com-pressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17. 1 ppl and 0. 97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19. Humans have a remarkable ability to remember information over long time horizons. When reading a book, we build up a compressed representation of the past narrative, such as the characters and events that have built up the story so far. We can do this even if they are separated by thousands of words from the current text, or long stretches of time between readings. During daily life, we make use of memories at varying timescales: from locating the car keys, placed in the morning, to recalling the name of an old friend from decades ago. These feats of memorisation are not achieved by storing every sensory glimpse throughout one's lifetime, but via lossy compression. We aggressively select, filter, or integrate input stimuli based on factors of surprise, perceived danger, or repetition -- amongst other signals (Richards and Frankland, 2017). Memory systems in artificial neural networks began with very compact representations of the past. Recurrent neural networks (RNNs, Rumelhart et al. (1986)) learn to represent the history of observations in a compressed state vector. The state is compressed because it uses far less space than the history of observations -- the model only preserving information that is pertinent to the optimization of the loss.

arxiv preprint arxiv, compressive transformer, transformer, (15 more...)

arXiv.org Machine Learning

1911.05507

Country:

Africa > Middle East > Egypt (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback