AITopics | indic

Collaborating Authors

indic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Language Model to Train if You Have One Million GPU Hours?

Scao, Teven Le, Wang, Thomas, Hesslow, Daniel, Saulnier, Lucile, Bekman, Stas, Bari, M Saiful, Biderman, Stella, Elsahar, Hady, Muennighoff, Niklas, Phang, Jason, Press, Ofir, Raffel, Colin, Sanh, Victor, Shen, Sheng, Sutawika, Lintang, Tae, Jaesung, Yong, Zheng Xin, Launay, Julien, Beltagy, Iz

arXiv.org Artificial IntelligenceNov-7-2022

The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notably, it can be difficult to evaluate how modeling decisions may impact emergent capabilities, given that these capabilities arise mainly from sheer scale alone. In the process of building BLOOM--the Big Science Large Open-science Open-access Multilingual language model--our goal is to identify an architecture and training setup that makes the best use of our 1,000,000 A100-GPU-hours budget. Specifically, we perform an ablation study at the billion-parameter scale comparing different modeling practices and their impact on zero-shot generalization. In addition, we study the impact of various popular pre-training corpora on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of Transformers to choose the target model size, shape, and training setup. All our models and code are open-sourced at https://huggingface.co/bigscience .

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.15424

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Niger (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

MATra: A Multilingual Attentive Transliteration System for Indian Scripts

Raj, Yash, Laddagiri, Bhavesh

arXiv.org Artificial IntelligenceAug-23-2022

Transliteration is a task in the domain of NLP where the output word is a similar-sounding word written using the letters of any foreign language. Today this system has been developed for several language pairs that involve English as either the source or target word and deployed in several places like Google Translate and chatbots. However, there is very little research done in the field of Indic languages transliterated to other Indic languages. This paper demonstrates a multilingual model based on transformers (with some modifications) that can give noticeably higher performance and accuracy than all existing models in this domain and get much better results than state-of-the-art models. This paper shows a model that can perform transliteration between any pair among the following five languages - English, Hindi, Bengali, Kannada and Tamil. It is applicable in scenarios where language is a barrier to communication in any written task. The model beats the state-of-the-art (for all pairs among the five mentioned languages - English, Hindi, Bengali, Kannada, and Tamil) and achieves a top-1 accuracy score of 80.7%, about 29.5% higher than the best current results. Furthermore, the model achieves 93.5% in terms of Phonetic Accuracy (transliteration is primarily a phonetic/sound-based task).

dataset, indic, transliteration, (15 more...)

arXiv.org Artificial Intelligence

2208.10801

Country: Asia > India > Chandigarh (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback