Symmetric Correspondence Topic Models for Multilingual Text Analysis

Mar-14-2024, 09:49:55 GMT–Neural Information Processing Systems

Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more effective than some other existing multilingual topic models.

pivot language, symcorrlda, topic model, (13 more...)

Neural Information Processing Systems

Mar-14-2024, 09:49:55 GMT

Conferences PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Pennsylvania
      - Allegheny County > Pittsburgh (0.14)
      - Philadelphia County > Philadelphia (0.04)
    - New Jersey > Bergen County
      - Mahwah (0.04)
    - California > Alameda County
      - Berkeley (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
- Europe
  - Ireland (0.04)
  - Austria (0.04)
  - United Kingdom
    - Scotland (0.04)
    - Northern Ireland (0.04)
    - England (0.04)
  - Sweden > Uppsala County
    - Uppsala (0.04)
  - Spain
    - Galicia > Madrid (0.04)
    - Valencian Community > Valencia Province
      - Valencia (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Japan > Honshū
    - Kansai
      - Osaka Prefecture > Osaka (0.04)
      - Kyoto Prefecture > Kyoto (0.04)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Text Processing (1.00)
  - Discourse & Dialogue (1.00)