Goto

Collaborating Authors

 aclweb


Multilingual Pre-training with Universal Dependency Learning

Neural Information Processing Systems

The pre-trained language model (PrLM) demonstrates domination in downstream natural language processing tasks, in which multilingual PrLM takes advantage of language universality to alleviate the issue of limited resources for low-resource languages. Despite its successes, the performance of multilingual PrLM is still unsatisfactory, when multilingual PrLMs only focus on plain text and ignore obvious universal linguistic structure clues. Existing PrLMs have shown that monolingual linguistic structure knowledge may bring about better performance. Thus we propose a novel multilingual PrLM that supports both explicit universal dependency parsing and implicit language modeling. Syntax in terms of universal dependency parse serves as not only pre-training objective but also learned representation in our model, which brings unprecedented PrLM interpretability and convenience in downstream task use. Our model outperforms two popular multilingual PrLM, multilingual-BERT and XLM-R, on cross-lingual natural language understanding (NLU) benchmarks and linguistic structure parsing datasets, demonstrating the effectiveness and stronger cross-lingual modeling capabilities of our approach.


The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

Neural Information Processing Systems

Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended ways.



Novel positional encodings to enable tree-based transformers

Neural Information Processing Systems

Motivated by this property, we propose a method to extend transformers to tree-structured data, enabling sequence-totree, tree-to-sequence, and tree-to-tree mappings. Our approach abstracts the transformer'ssinusoidal positional encodings, allowing ustoinstead useanovel positional encoding scheme to represent node positions within trees.



Assessing Social and Intersectional Biases in Contextualized Word Representations

Neural Information Processing Systems

Socialbiasinmachine learning hasdrawnsignificant attention, withworkranging from demonstrations of bias in a multitude of applications, curating definitions of fairness for different contexts, to developing algorithms to mitigate bias. In natural language processing, gender bias has been shown to exist in context-free word embeddings. Recently, contextual word representations have outperformed word embeddings in several downstream NLP tasks.




Retrieval-Augmented Generationfor Knowledge-Intensive NLPTasks

Neural Information Processing Systems

This 14th century work is divided into 3 sections: "Inferno", "Purgatorio" & "Paradiso" (y) Barack Obama was born in Hawaii.(x) Define "middle ear"(x) Question Answering: Question Query The middle ear includes the tympanic cavity and the three ossicles.


479b4864e55e12e0fb411eadb115c095-Supplemental.pdf

Neural Information Processing Systems

Unlike methods for ensembling numerical or categorical values for regression or classification problems where the mean value or majority votes are used respectively, the problem of graph ensemble is more complicated.