Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings

Yamagiwa, Hiroaki, Takase, Yusuke, Shimodaira, Hidetoshi

Jan-11-2024–arXiv.org Artificial Intelligence

Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we focus on this property and propose a novel method, Axis Tour, which optimizes the order of the axes. Inspired by Word Tour, a onedimensional word embedding method, we aim to improve the clarity of the word embedding space by maximizing the semantic continuity of the axes. Furthermore, we show through experiments Figure 1: Scatterplots of normalized ICA-transformed on downstream tasks that Axis Tour word embeddings whose axes are ordered by Axis Tour constructs better low-dimensional embeddings and Skewness Sort. In the upper part, Axis Tour is applied compared to both PCA and ICA.

axis, axis tour, similarity, (14 more...)

arXiv.org Artificial Intelligence

Jan-11-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Queensland (0.04)
- North America
  - United States
    - Connecticut (0.04)
    - Arizona (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
  - Canada
    - Saskatchewan (0.04)
    - Quebec (0.04)
    - Ontario (0.04)
- Europe
  - France (0.04)
  - Czechia > Prague (0.04)
  - Eastern Europe (0.04)
  - Germany > Hamburg (0.04)
  - Croatia (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Slovenia > Central Slovenia
    - Municipality of Ljubljana > Ljubljana (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - United Kingdom > England
    - Lincolnshire (0.04)
  - Poland > Masovia Province
    - Warsaw (0.04)
- Asia
  - Singapore (0.04)
  - Russia (0.04)
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)
  - Japan > Honshū
    - Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:
- Research Report (1.00)

Industry:
- Law (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Statistical Learning (0.95)