To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph

Mar-14-2023–arXiv.org Artificial Intelligence

Abstract--In this paper, we propose an explanation of representation for self-attention network (SAN) based neural sequence encoders, which regards the information captured by the model and the encoding of the model as graph structure and the generation of these graph structures respectively. The proposed explanation applies to existing works on SAN-based models and can explain the relationship among the ability to capture the structural or linguistic information, depth of model, and length of sentence, and can also be extended to other models such as recurrent neural network based models. We also propose a revisited multigraph called Multi-order-Graph (MoG) based on our explanation to model the graph structures in the SAN-based model as subgraphs in MoG and convert the encoding of SAN-based model to the generation of MoG. Based on our explanation, we further introduce a Graph-Transformer by enhancing the ability to capture multiple subgraphs of different orders and focusing on subgraphs of high orders. Experimental results on multiple neural machine translation tasks show that the Graph-Transformer can yield effective performance improvement. These works show that SAN-based models can embed structural which the encoder takes a sentence as input and generates the and linguistic information, and the information embedding ability corresponding contextualized representations for the decoder for is related to the model depth and sentence length. So far, although NLP tasks with various we may get intuitions as follows, (1) different layers in SANbased modeling ways, generally, there are mainly three types of encoder models may deliver different sorts of information, (2) architectures, recurrent neural network (RNN) [1], [2], [3], convolutional increasing the depth of the model can improve the performance neural network (CNN), and self-attention network (SAN) while improvement may be tiny when the model is too deep, from Transformer [4].

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Mar-14-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Nevada > Clark County
      - Las Vegas (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.28)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California
      - San Diego County > San Diego (0.04)
      - Los Angeles County > Long Beach (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.14)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > China
  - Shanghai > Shanghai (0.04)
  - Hong Kong (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.86)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found