Distributed Word Representation in Tsetlin Machine
Yadav, Rohan Kumar, Jiao, Lei, Granmo, Ole-Christoffer, Goodwin, Morten
–arXiv.org Artificial Intelligence
Tsetlin Machine (TM) is an interpretable pattern recognition algorithm based on propositional logic. The algorithm has demonstrated competitive performance in many Natural Language Processing (NLP) tasks, including sentiment analysis, text classification, and Word Sense Disambiguation (WSD). To obtain human-level interpretability, legacy TM employs Boolean input features such as bag-of-words (BOW). However, the BOW representation makes it difficult to use any pre-trained information, for instance, word2vec and GloVe word representations. This restriction has constrained the performance of TM compared to deep neural networks (DNNs) in NLP. To reduce the performance gap, in this paper, we propose a novel way of using pre-trained word representations for TM. The approach significantly enhances the TM performance and maintains interpretability at the same time. We achieve this by extracting semantically related words from pre-trained word representations as input features to the TM. Our experiments show that the accuracy of the proposed approach is significantly higher than the previous BOW-based TM, reaching the level of DNN-based models.
arXiv.org Artificial Intelligence
Apr-14-2021
- Country:
- Oceania > Australia
- Victoria > Melbourne (0.04)
- New South Wales > Sydney (0.04)
- North America > United States
- Nevada (0.04)
- Michigan (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Europe
- Norway (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Italy > Tuscany
- Florence (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Asia
- South Korea (0.04)
- Middle East > Qatar
- China > Beijing
- Beijing (0.04)
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Media (0.46)
- Technology: