AITopics | Søgaard, Anders

Plotting

Søgaard, Anders

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Factual Consistency of Multilingual Pretrained Language Models

Fierro, Constanza, Søgaard, Anders

arXiv.org Artificial IntelligenceMar-22-2022

Pretrained language models can be queried for factual knowledge, with potential applications in knowledge base acquisition and tasks that require inference. However, for that, we need to know how reliable this knowledge is, and recent work has shown that monolingual English language models lack consistency when predicting factual knowledge, that is, they fill-in-the-blank differently for paraphrases describing the same fact. In this paper, we extend the analysis of consistency to a multilingual setting. We introduce a resource, mParaRel, and investigate (i) whether multilingual language models such as mBERT and XLM-R are more consistent than their monolingual counterparts; and (ii) if such models are equally consistent across languages. We find that mBERT is as inconsistent as English BERT in English paraphrases, but that both mBERT and XLM-R exhibit a high degree of inconsistency in English and even more so for all the other 45 languages.

artificial intelligence, computational linguistic, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.findings-acl.240

2203.11552

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Add feedback

Revisiting Methods for Finding Influential Examples

K, Karthikeyan, Søgaard, Anders

arXiv.org Artificial IntelligenceNov-8-2021

Several instance-based explainability methods for finding influential training examples for test-time decisions have been proposed recently, including Influence Functions, TraceIn, Representer Point Selection, Grad-Dot, and Grad-Cos. Typically these methods are evaluated using LOO influence (Cook's distance) as a gold standard, or using various heuristics. In this paper, we show that all of the above methods are unstable, i.e., extremely sensitive to initialization, ordering of the training data, and batch size. We suggest that this is a natural consequence of how in the literature, the influence of examples is assumed to be independent of model state and other examples -- and argue it is not. We show that LOO influence and heuristics are, as a result, poor metrics to measure the quality of instance-based explanations, and instead propose to evaluate such explanations by their ability to detect poisoning attacks. Further, we provide a simple, yet effective baseline to improve all of the above methods and show how it leads to very significant improvements on downstream tasks.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2111.04683

Country:

Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Replicating and Extending "Because Their Treebanks Leak": Graph Isomorphism, Covariants, and Parser Performance

Anderson, Mark, Søgaard, Anders, Rodríguez, Carlos Gómez

arXiv.org Artificial IntelligenceJun-2-2021

S{\o}gaard (2020) obtained results suggesting the fraction of trees occurring in the test data isomorphic to trees in the training set accounts for a non-trivial variation in parser performance. Similar to other statistical analyses in NLP, the results were based on evaluating linear regressions. However, the study had methodological issues and was undertaken using a small sample size leading to unreliable results. We present a replication study in which we also bin sentences by length and find that only a small subset of sentences vary in performance with respect to graph isomorphism. Further, the correlation observed between parser performance and graph isomorphism in the wild disappears when controlling for covariants. However, in a controlled experiment, where covariants are kept fixed, we do observe a strong correlation. We suggest that conclusions drawn from statistical analyses like this need to be tempered and that controlled experiments can complement them by more readily teasing factors apart.

artificial intelligence, dug, natural language, (19 more...)

arXiv.org Artificial Intelligence

2106.00352

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Does injecting linguistic structure into language models lead to better alignment with brain recordings?

Abdou, Mostafa, Gonzalez, Ana Valeria, Toneva, Mariya, Hershcovich, Daniel, Søgaard, Anders

arXiv.org Artificial IntelligenceJan-29-2021

Neuroscientists evaluate deep neural networks for natural language processing as possible candidate models for how language is processed in the brain. These models are often trained without explicit linguistic supervision, but have been shown to learn some linguistic structure in the absence of such supervision (Manning et al., 2020), potentially questioning the relevance of symbolic linguistic theories in modeling such cognitive processes (Warstadt and Bowman, 2020). We evaluate across two fMRI datasets whether language models align better with brain recordings, if their attention is biased by annotations from syntactic or semantic formalisms. Using structure from dependency or minimal recursion semantic annotations, we find alignments improve significantly for one of the datasets. For another dataset, we see more mixed results. We present an extensive analysis of these results. Our proposed approach enables the evaluation of more targeted hypotheses about the composition of meaning in the brain, expanding the range of possible scientific inferences a neuroscientist could make, and opens up new opportunities for cross-pollination between computational neuroscience and linguistics.

machine translation, neural network, representation, (24 more...)

arXiv.org Artificial Intelligence

2101.12608

Country:

Europe (1.00)
North America > United States > Colorado (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

A Survey of Cross-lingual Word Embedding Models

Ruder, Sebastian, Vulić, Ivan, Søgaard, Anders

Journal of Artificial Intelligence ResearchAug-12-2019

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent, modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

information retrieval, machine learning, natural language, (25 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11640

AI Access Foundation

11640

Journal of Artificial Intelligence Research

Country:

Europe > Ireland (0.27)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Overview (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(3 more...)

Add feedback

A Discriminative Latent-Variable Model for Bilingual Lexicon Induction

Ruder, Sebastian, Cotterell, Ryan, Kementchedjhieva, Yova, Søgaard, Anders

arXiv.org Machine LearningAug-28-2018

We introduce a novel discriminative latent-variable model for the task of bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a state-of-the-art embedding-based approach. To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical improvements on six language pairs under two metrics and show that the prior theoretically and empirically helps to mitigate the hubness problem. We also demonstrate how previous work may be viewed as a similarly fashioned latent-variable model, albeit with a different prior.

artificial intelligence, bilingual lexicon induction, optimization problem, (18 more...)

arXiv.org Machine Learning

1808.09334

Country:

North America > United States (0.46)
Europe > Ireland (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

On the Limitations of Unsupervised Bilingual Dictionary Induction

Søgaard, Anders, Ruder, Sebastian, Vulić, Ivan

arXiv.org Machine LearningMay-9-2018

Unsupervised machine translation---i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora---seems impossible, but nevertheless, Lample et al. (2018) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

artificial intelligence, machine translation, proceedings, (16 more...)

arXiv.org Machine Learning

1805.0362

Country: Europe > Ireland (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

Augenstein, Isabelle, Ruder, Sebastian, Søgaard, Anders

arXiv.org Artificial IntelligenceFeb-27-2018

We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for aspect- and topic-based sentiment analysis.

deep learning, neural network, proceedings, (16 more...)

arXiv.org Artificial Intelligence

1802.09913

Country:

Europe (0.68)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Learning to Predict Readability Using Eye-Movement Data From Natives and Learners

González-Garduño, Ana V. (University of Copenhagen) | Søgaard, Anders (University of Copenhagen)

AAAI ConferencesFeb-8-2018

Readability assessment can improve the quality of assisting technologies aimed at language learners. Eye-tracking data has been used for both inducing and evaluating general-purpose NLP/AI models, and below we show that unsurprisingly, gaze data from language learners can also improve multi-task readability assessment models. This is unsurprising, since the gaze data records the reading difficulties ofthe learners. Unfortunately, eye-tracking data from language learners is often much harder to obtain than eye-tracking data from native speakers. We therefore compare the performance of deep learning readability models that use nativespeaker eye movement data to models using data from language learners. Somewhat surprisingly, we observe no significant drop in performance when replacing learners with natives, making approaches that rely on native speaker gaze information, more scalable. In other words, our finding is that language learner difficulties can be efficiently estimated from native speakers, which suggests that, more generally, readily available gaze data can be used to improve educational NLP/AI models targeted towards language learners.

artificial intelligence, learner, neural network, (16 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Learning what to share between loosely related tasks

Ruder, Sebastian, Bingel, Joachim, Augenstein, Isabelle, Søgaard, Anders

arXiv.org Artificial IntelligenceJan-16-2018

Multi-task learning is motivated by the observation that humans bring to bear what they know about related problems when solving new ones. Similarly, deep neural networks can profit from related tasks by sharing parameters with other networks. However, humans do not consciously decide to transfer knowledge between tasks. In Natural Language Processing (NLP), it is hard to predict if sharing will lead to improvements, particularly if tasks are only loosely related. To overcome this, we introduce Sluice Networks, a general framework for multi-task learning where trainable parameters control the amount of sharing. Our framework generalizes previous proposals in enabling sharing of all combinations of subspaces, layers, and skip connections. We perform experiments on three task pairs, and across seven different domains, using data from OntoNotes 5.0, and achieve up to 15% average error reductions over common approaches to multi-task learning. We show that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing and b) while sluice networks easily fit noise, they are robust across domains in practice.

deep learning, neural network, proceedings, (19 more...)

arXiv.org Artificial Intelligence

1705.08142

Country:

Europe > Denmark (0.14)
Europe > Ireland (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback