AITopics | Rei, Marek

Collaborating Authors

Rei, Marek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Logical Reasoning for Natural Language Inference Using Generated Facts as Atoms

Stacey, Joe, Minervini, Pasquale, Dubossarsky, Haim, Camburu, Oana-Maria, Rei, Marek

arXiv.org Artificial IntelligenceMay-22-2023

State-of-the-art neural models can now reach human performance levels across various natural language understanding tasks. However, despite this impressive performance, models are known to learn from annotation artefacts at the expense of the underlying task. While interpretability methods can identify influential features for each prediction, there are no guarantees that these features are responsible for the model decisions. Instead, we introduce a model-agnostic logical framework to determine the specific information in an input responsible for each model decision. This method creates interpretable Natural Language Inference (NLI) models that maintain their predictive power. We achieve this by generating facts that decompose complex NLI observations into individual logical atoms. Our model makes predictions for each atom and uses logical rules to decide the class of the observation based on the predictions for each atom. We apply our method to the highly challenging ANLI dataset, where our framework improves the performance of both a DeBERTa-base and BERT baseline. Our method performs best on the most challenging examples, achieving a new state-of-the-art for the ANLI round 3 test set. We outperform every baseline in a reduced-data setting, and despite using no annotations for the generated facts, our model predictions for individual facts align with human expectations.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2305.13214

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Improving Robustness in Knowledge Distillation Using Domain-Targeted Data Augmentation

Stacey, Joe, Rei, Marek

arXiv.org Artificial IntelligenceMay-22-2023

Applying knowledge distillation encourages a student model to behave more like a teacher model, largely retaining the performance of the teacher model, even though the student model may have substantially fewer parameters. However, while distillation helps student models behave more like teacher models in-distribution, this is not necessarily the case out-of-distribution. To address this, we use a language model to create task-specific unlabeled data that mimics the data in targeted out-of-distribution domains. We use this generated data for knowledge distillation on the task of Natural Language Inference (NLI), encouraging the student models to behave more like the teacher models for these examples. Our domain-targeted augmentation is highly effective, and outperforms previous robustness methods when evaluating out-of-distribution performance on MNLI. Surprisingly, this method also improves performance on out-of-distribution domains that the data was not generated for. We additionally introduce Distilled Minority Upsampling (DMU), a method for identifying and upsampling minority examples during the distillation. DMU is complementary to the domain-targeted augmentation, and substantially improves performance on SNLI-hard. Finally, we show out-of-distribution improvements on HANS from both of our methods, despite augmenting the training data with fewer than 5k examples.

distillation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.13067

Country:

Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers

Bujel, Kamil, Caines, Andrew, Yannakoudakis, Helen, Rei, Marek

arXiv.org Artificial IntelligenceMar-14-2023

Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We find standard soft attention methods to perform significantly worse when combined with the Longformer language model. We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets, while also exhibiting significantly lower runtimes.

artificial intelligence, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

2303.07991

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.62)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.55)

Add feedback

Modelling Temporal Document Sequences for Clinical ICD Coding

Ng, Clarence Boon Liang, Santos, Diogo, Rei, Marek

arXiv.org Artificial IntelligenceFeb-24-2023

Past studies on the ICD coding problem focus on predicting clinical codes primarily based on the discharge summary. This covers only a small fraction of the notes generated during each hospital stay and leaves potential for improving performance by analysing all the available clinical notes. We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical notes in each hospital stay for ICD coding, and incorporates embeddings for text metadata such as their position, time, and type of note. While using all clinical notes increases the quantity of data substantially, superconvergence can be used to reduce training costs. We evaluate the model on the MIMIC-III dataset. Our model exceeds the prior state-of-the-art when using only discharge summaries as input, and achieves further performance improvements when all clinical notes are used as input.

discharge summary, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2302.12666

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

An Extended Sequence Tagging Vocabulary for Grammatical Error Correction

Mesham, Stuart, Bryant, Christopher, Rei, Marek, Yuan, Zheng

arXiv.org Artificial IntelligenceFeb-12-2023

We extend a current sequence-tagging approach to Grammatical Error Correction (GEC) by introducing specialised tags for spelling correction and morphological inflection using the SymSpell and LemmInflect algorithms. Our approach improves generalisation: the proposed new tagset allows a smaller number of tags to correct a larger range of errors. Our results show a performance improvement both overall and in the targeted error categories. We further show that ensembles trained with our new tagset outperform those trained with the baseline tagset on the public BEA benchmark.

artificial intelligence, data quality, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.05913

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Washington > King County > Seattle (0.14)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.72)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Data Science > Data Quality > Data Cleaning (0.62)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Probing for targeted syntactic knowledge through grammatical error detection

Davis, Christopher, Bryant, Christopher, Caines, Andrew, Rei, Marek, Buttery, Paula

arXiv.org Artificial IntelligenceOct-28-2022

Targeted studies testing knowledge of subject-verb agreement (SVA) indicate that pre-trained language models encode syntactic information. We assert that if models robustly encode subject-verb agreement, they should be able to identify when agreement is correct and when it is incorrect. To that end, we propose grammatical error detection as a diagnostic probe to evaluate token-level contextual representations for their knowledge of SVA. We evaluate contextual representations at each layer from five pre-trained English language models: BERT, XLNet, GPT-2, RoBERTa, and ELECTRA. We leverage public annotated training data from both English second language learners and Wikipedia edits, and report results on manually crafted stimuli for subject-verb agreement. We find that masked language models linearly encode information relevant to the detection of SVA errors, while the autoregressive models perform on par with our baseline. However, we also observe a divergence in performance when probes are trained on different training sets, and when they are evaluated on different syntactic constructions, suggesting the information pertaining to SVA error detection is not robustly encoded.

artificial intelligence, natural language, probe, (17 more...)

arXiv.org Artificial Intelligence

2210.16228

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Guiding Visual Question Generation

Vedd, Nihir, Wang, Zixu, Rei, Marek, Miao, Yishu, Specia, Lucia

arXiv.org Artificial IntelligenceJul-26-2022

In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their training data. This makes training difficult and also poses issues for evaluation -- multiple valid questions exist for most images but only one or a few are captured by the human references. We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information based on expectations on the type of question and the objects it should explore. We propose two variants: (i) an explicitly guided model that enables an actor (human or automated) to select which objects and categories to generate a question for; and (ii) an implicitly guided model that learns which objects and categories to condition on, based on discrete latent variables. The proposed models are evaluated on an answer-category augmented VQA dataset and our quantitative results show a substantial improvement over the current state of the art (over 9 BLEU-4 increase). Human evaluation validates that guidance helps the generation of questions that are grammatically coherent and relevant to the given image and objects.

category, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2110.08226

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Control Prefixes for Text Generation

Clive, Jordan, Cao, Kris, Rei, Marek

arXiv.org Artificial IntelligenceOct-15-2021

Prompt learning methods adapt pre-trained language models to downstream applications by using a task-specific prompt together with the input. Most of the current work on prompt learning in text generation relies on a shared dataset-level prompt for all examples in the dataset. We extend this approach and propose a dynamic method, Control Prefixes, which allows for the inclusion of conditional input-dependent information in each prompt. Control Prefixes is at the intersection of prompt learning and controlled generation, empowering the model to have finer-grained control during text generation. The method incorporates attribute-level learnable representations into different layers of a pre-trained transformer, allowing for the generated text to be guided in a particular direction. We provide a systematic evaluation of the technique and apply it to five datasets from the GEM benchmark for natural language generation (NLG). We present state-of-the-art results on several data-to-text datasets, including WebNLG.

machine learning, teaching medhods, teaching method, (21 more...)

arXiv.org Artificial Intelligence

2110.08329

Country:

Europe (1.00)
North America > United States > California (0.14)
North America > United States > Texas (0.14)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (1.00)
Government (0.93)
Media (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.55)

Add feedback

Contextual Sentence Classification: Detecting Sustainability Initiatives in Company Reports

Hirlea, Dan, Bryant, Christopher, Rei, Marek

arXiv.org Artificial IntelligenceOct-7-2021

We introduce the novel task of detecting sustainability initiatives in company reports. Given a full report, the aim is to automatically identify mentions of practical activities that a company has performed in order to tackle specific societal issues. As a single initiative can often be described over multiples sentences, new methods for identifying continuous sentence spans needs to be developed. We release a new dataset of company reports in which the text has been manually annotated with sustainability initiatives. We also evaluate different models for initiative detection, introducing a novel aggregation and evaluation methodology. Our proposed architecture uses sequences of five consecutive sentences to account for contextual information when making classification decisions at the individual sentence level.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2110.03727

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.47)
Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Turning transformer attention weights into zero-shot sequence labelers

Bujel, Kamil, Yannakoudakis, Helen, Rei, Marek

arXiv.org Artificial IntelligenceMar-26-2021

We demonstrate how transformer-based models can be redesigned in order to capture inductive biases across tasks on different granularities and perform inference in a zero-shot manner. Specifically, we show how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. We compare against a range of diverse and previously proposed methods for generating token-level labels, and present a simple yet effective modified attention layer that significantly advances the current state of the art.

artificial intelligence, computational linguistics, neural network, (17 more...)

arXiv.org Artificial Intelligence

2103.14465

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback