AITopics | Setiawan, Hendra

Collaborating Authors

Setiawan, Hendra

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accurate Knowledge Distillation with n-best Reranking

Setiawan, Hendra

arXiv.org Artificial IntelligenceNov-14-2023

We propose utilizing n-best reranking to enhance the Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we explore hypotheses beyond the top-1 to acquire more accurate pseudo-labels. To accomplish this, we leverage a diverse set of models with different inductive biases, objective functions or architectures, including publicly-available large pretrained models. The effectiveness of our proposal is validated through experiments on the WMT'21 German-English and Chinese-English translation tasks. Our results demonstrate that utilizing the pseudo-labels generated by our n-best reranker leads to a significantly more accurate student model. In fact, our best student model achieves comparable accuracy to a large translation model from (Tran et al., 2021) with 4.7 billion parameters, while having two orders of magnitude fewer parameters.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.12057

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Education (0.57)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automating Behavioral Testing in Machine Translation

Ferrando, Javier, Sperber, Matthias, Setiawan, Hendra, Telaar, Dominic, Hasan, Saša

arXiv.org Artificial IntelligenceNov-2-2023

Behavioral testing in NLP allows fine-grained evaluation of systems by examining their linguistic capabilities through the analysis of input-output behavior. Unfortunately, existing work on behavioral testing in Machine Translation (MT) is currently restricted to largely handcrafted tests covering a limited range of capabilities and languages. To address this limitation, we propose to use Large Language Models (LLMs) to generate a diverse set of source sentences tailored to test the behavior of MT models in a range of situations. We can then verify whether the MT model exhibits the expected behavior through matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2309.02553

Country:

Asia > Middle East > UAE (0.14)
North America > United States > California (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

One Wide Feedforward is All You Need

Pires, Telmo Pessoa, Lopes, António V., Assogba, Yannick, Setiawan, Hendra

arXiv.org Artificial IntelligenceOct-21-2023

The Transformer architecture has two main non-embedding components: Attention and the Feed Forward Network (FFN). Attention captures interdependencies between words regardless of their position, while the FFN non-linearly transforms each input token independently. In this work we explore the role of the FFN, and find that despite taking up a significant fraction of the model's parameters, it is highly redundant. Concretely, we are able to substantially reduce the number of parameters with only a modest drop in accuracy by removing the FFN on the decoder layers and sharing a single FFN across the encoder. Finally we scale this architecture back to its original size by increasing the hidden dimension of the shared FFN, achieving substantial gains in both accuracy and latency with respect to the original Transformer Big.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2309.01826

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Gheini, Mozhdeh, Likhomanenko, Tatiana, Sperber, Matthias, Setiawan, Hendra

arXiv.org Artificial IntelligenceDec-19-2022

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in pseudo-label quality degradation. We investigate two categories of remedies that require no additional supervision and target the domain mismatch: pseudo-label filtering and data augmentation. We show that pseudo-label analysis and processing as such results in additional gains on top of the vanilla pseudo-labeling setup resulting in total improvements of up to 0.6% absolute WER and 2.2 BLEU points.

machine learning, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2212.09982

Country:

Europe (0.93)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback