HUBERT Untangles BERT to Improve Transfer across NLP Tasks

Moradshahi, Mehrad, Palangi, Hamid, Lam, Monica S., Smolensky, Paul, Gao, Jianfeng

Oct-25-2019–arXiv.org Machine Learning

We show that there is shared structure between different NLP datasets that HUBERT, but not BERT, is able to learn and leverage. Our experiment results show that untangling data-specific semantics from general language structure is key for better transfer among NLP tasks. Built on the Transformer architecture (V aswani et al., 2017), the BERT model (Devlin et al., 2018) has demonstrated great power for providing general-purpose vector embeddings of natural language: its representations have served as the basis of many successful deep Natural Language Processing (NLP) models on a variety of tasks (e.g., Liu et al., 2019a;b; Zhang et al., 2019). Recent studies (Coenen et al., 2019; Hewitt & Manning, 2019; Lin et al., 2019; Tenney et al., 2019) have shown that BERT representations carry considerable information about grammatical structure, which, by design, is a deep and general encapsulation of linguistic information. Symbolic computation over structured symbolic representations such as parse trees has long been used to formalize linguistic knowledge. To strengthen the generality of BERT's representations, we propose to import into its architecture this type of computation. Symbolic linguistic representations support the important distinction between content and form information. The form consists of a structure devoid of content, such as an unlabeled tree, a collection of nodes defined by their structural positions or roles (Newell, 1980), such as root, left-child-of-root, right-child-of-left-child-of root, etc. In a particular linguistic expression such as "Kim referred to herself during the speech", these purely-structural roles are filled with particular content-bearing symbols, including terminal words like Kim and non-terminal categories like NounPhrase . These role fillers have their own identities, which are preserved as they move from role to role across expressions: Kim retains its referent and its semantic properties whether it fills the subject or the object role in a sentence.

deep learning, neural network, representation, (21 more...)

arXiv.org Machine Learning

Oct-25-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report > New Finding (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found