AITopics | Rutherford, Attapol T.

Collaborating Authors

Rutherford, Attapol T.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ThaiCoref: Thai Coreference Resolution Dataset

Trakuekul, Pontakorn, Leong, Wei Qi, Polpanumas, Charin, Sawatphol, Jitkapat, Tjhi, William Chandra, Rutherford, Attapol T.

arXiv.org Artificial IntelligenceJun-9-2024

While coreference resolution is a well-established research area in Natural Language Processing (NLP), research focusing on Thai language remains limited due to the lack of large annotated corpora. In this work, we introduce ThaiCoref, a dataset for Thai coreference resolution. Our dataset comprises 777,271 tokens, 44,082 mentions and 10,429 entities across four text genres: university essays, newspapers, speeches, and Wikipedia. Our annotation scheme is built upon the OntoNotes benchmark with adjustments to address Thai-specific phenomena. Utilizing ThaiCoref, we train models employing a multilingual encoder and cross-lingual transfer techniques, achieving a best F1 score of 67.88\% on the test set. Error analysis reveals challenges posed by Thai's unique linguistic features. To benefit the NLP community, we make the dataset and the model publicly available at http://www.github.com/nlp-chula/thai-coref .

computational linguistic, information retrieval, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.06

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.68)
Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)

Add feedback

Thai Universal Dependency Treebank

Sriwirote, Panyut, Leong, Wei Qi, Polpanumas, Charin, Thanyawong, Santhawat, Tjhi, William Chandra, Aroonmanakun, Wirote, Rutherford, Attapol T.

arXiv.org Artificial IntelligenceMay-13-2024

Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published systematic evaluation of state-of-the-art models, especially transformer-based parsers. In this work, we address these problems by introducing Thai Universal Dependency Treebank (TUD), a new largest Thai treebank consisting of 3,627 trees annotated in accordance with the Universal Dependencies (UD) framework. We then benchmark dependency parsing models that incorporate pretrained transformers as encoders and train them on Thai-PUD and our TUD. The evaluation results show that most of our models can outperform other models reported in previous papers and provide insight into the optimal choices of components to include in Thai dependency parsers. The new treebank and every model's full prediction generated in our experiment are made available on a GitHub repository for further study.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.07586

Country:

Europe (0.93)
Asia > Thailand (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords

Sriwirote, Panyut, Thapiang, Jalinee, Timtong, Vasan, Rutherford, Attapol T.

arXiv.org Artificial IntelligenceDec-28-2023

While WangchanBERTa has become the de facto standard in transformer-based Thai language modeling, it still has shortcomings in regard to the understanding of foreign words, most notably English words, which are often borrowed without orthographic assimilation into Thai in many contexts. We identify the lack of foreign vocabulary in WangchanBERTa's tokenizer as the main source of these shortcomings. We then expand WangchanBERTa's vocabulary via vocabulary transfer from XLM-R's pretrained tokenizer and pretrain a new model using the expanded tokenizer, starting from WangchanBERTa's checkpoint, on a new dataset that is larger than the one used to train WangchanBERTa. Our results show that our new pretrained model, PhayaThaiBERT, outperforms WangchanBERTa in many downstream tasks and datasets.

huggingface, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.12475

Country:

Oceania > Australia (0.14)
Europe > Belgium (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback