AITopics | Husain, Jaavid Aktar

Collaborating Authors

Husain, Jaavid Aktar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs

Saji, Alan, Husain, Jaavid Aktar, Jayakumar, Thanmay, Dabre, Raj, Kunchukuttan, Anoop, Puduppully, Ratish

arXiv.org Artificial IntelligenceFeb-16-2025

Large Language Models (LLMs) exhibit remarkable multilingual generalization despite being predominantly trained on English-centric corpora. A fundamental question arises: how do LLMs achieve such robust multilingual capabilities? We take the case of non-Roman script languages, we investigate the role of Romanization - the representation of non-Roman scripts using Roman characters - as a bridge in multilingual processing. Using mechanistic interpretability techniques, we analyze next-token generation and find that intermediate layers frequently represent target words in Romanized form before transitioning to native script, a phenomenon we term Latent Romanization. Further, through activation patching experiments, we demonstrate that LLMs encode semantic concepts similarly across native and Romanized scripts, suggesting a shared underlying representation. Additionally, for translation into non-Roman script languages, our findings reveal that when the target language is in Romanized form, its representations emerge earlier in the model's layers compared to native script. These insights contribute to a deeper understanding of multilingual representation in LLMs and highlight the implicit role of Romanization in facilitating language transfer.

hindi, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.07424

Country:

Asia (0.93)
North America > United States (0.28)
Europe > Austria (0.28)
North America > Mexico (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Airavata: Introducing Hindi Instruction-tuned LLM

Gala, Jay, Jayakumar, Thanmay, Husain, Jaavid Aktar, M, Aswanth Kumar, Khan, Mohammed Safi Ur Rahman, Kanojia, Diptesh, Puduppully, Ratish, Khapra, Mitesh M., Dabre, Raj, Murthy, Rudra, Kunchukuttan, Anoop

arXiv.org Artificial IntelligenceJan-26-2024

The last year has witnessed tremendous interest and activity in the world of Large Language Models (LLMs). LLMs hold the potential to unlock exciting applications in artificial intelligence thanks to their ability to comprehend complex natural language instructions and excel in a broad spectrum of tasks involving language, knowledge, reasoning, and creative generation. To foster research, innovation, and widespread adoption, an open ecosystem is essential. We have observed significant advancements in this area with the launch of models like Llama 2 (Touvron et al., 2023) and Mistral (Jiang et al., 2023), as well as their instruction-tuned variants such as Llama 2 Chat (Touvron et al., 2023), Mistral-Instruct (Jiang et al., 2023), and Zephyr (Tunstall et al., 2023), among others. However, most of these advancements have been predominantly centered on the English language. There is limited support for Indian languages, which can be attributed to the incidental inclusion of some Indian language data that slipped through the data filters during the pre-training of these language models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.15006

Country: North America > United States (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization

Husain, Jaavid Aktar, Dabre, Raj, Kumar, Aswanth, Puduppully, Ratish, Kunchukuttan, Anoop

arXiv.org Artificial IntelligenceJan-25-2024

This study addresses the challenge of extending Large Language Models (LLMs) to non-English languages, specifically those using non-Latin scripts. We propose an innovative approach that utilizes the romanized form of text as an interface for LLMs, hypothesizing that its frequent informal use and shared tokens with English enhance cross-lingual alignment. Focusing on Hindi, we demonstrate through Hindi-to-English translation and sentiment analysis tasks that romanized text not only significantly improves inference efficiency due to its lower fertility compared to native text but also achieves competitive performance with limited pre-training. Additionally, our novel multi-script prompting approach, which combines romanized and native texts, shows promise in further enhancing task performance. These findings suggest the potential of romanization in bridging the language gap for LLM applications, with future work aimed at expanding this approach to more languages and tasks.

english, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2401.1428

Country:

North America > United States (0.14)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback