AITopics | Mishra, Biswajit

Collaborating Authors

Mishra, Biswajit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Llama-3.1-Sherkala-8B-Chat: An Open Large Language Model for Kazakh

Koto, Fajri, Joshi, Rituraj, Mukhituly, Nurdaulet, Wang, Yuxia, Xie, Zhuohan, Pal, Rahul, Orel, Daniil, Mullah, Parvez, Turmakhan, Diana, Goloburda, Maiya, Kamran, Mohammed, Ghosh, Samujjwal, Jia, Bokang, Mansurov, Jonibek, Togmanov, Mukhammed, Banerjee, Debopriyo, Laiyk, Nurkhan, Sakip, Akhmed, Han, Xudong, Kochmar, Ekaterina, Aji, Alham Fikri, Singh, Aaryamonvikram, Jadhav, Alok Anil, Katipomu, Satheesh, Kamboj, Samta, Choudhury, Monojit, Gosal, Gurpreet, Ramakrishnan, Gokul, Mishra, Biswajit, Chandran, Sarath, Sheinin, Avraham, Vassilieva, Natalia, Sengupta, Neha, Murray, Larry, Nakov, Preslav

arXiv.org Artificial IntelligenceMar-3-2025

Llama-3.1-Sherkala-8B-Chat, or Sherkala-Chat (8B) for short, is a state-of-the-art instruction-tuned open generative large language model (LLM) designed for Kazakh. Sherkala-Chat (8B) aims to enhance the inclusivity of LLM advancements for Kazakh speakers. Adapted from the LLaMA-3.1-8B model, Sherkala-Chat (8B) is trained on 45.3B tokens across Kazakh, English, Russian, and Turkish. With 8 billion parameters, it demonstrates strong knowledge and reasoning abilities in Kazakh, significantly outperforming existing open Kazakh and multilingual models of similar scale while achieving competitive performance in English. We release Sherkala-Chat (8B) as an open-weight instruction-tuned model and provide a detailed overview of its training, fine-tuning, safety alignment, and evaluation, aiming to advance research and support diverse real-world applications.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.01493

Country:

Asia (1.00)
Europe > Middle East > Malta (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (0.81)
Overview (0.74)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Foundational Large Language Models for Materials Research

Mishra, Vaibhav, Singh, Somaditya, Ahlawat, Dhruv, Zaki, Mohd, Bihani, Vaibhav, Grover, Hargun Singh, Mishra, Biswajit, Miret, Santiago, Mausam, null, Krishnan, N. M. Anoop

arXiv.org Artificial IntelligenceDec-12-2024

Materials discovery and development are critical for addressing global challenges. Yet, the exponential growth in materials science literature comprising vast amounts of textual data has created significant bottlenecks in knowledge extraction, synthesis, and scientific reasoning. Large Language Models (LLMs) offer unprecedented opportunities to accelerate materials research through automated analysis and prediction. Still, their effective deployment requires domain-specific adaptation for understanding and solving domain-relevant tasks. Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models on an extensive corpus of materials literature and crystallographic data. Through systematic evaluation, we demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities. The specialized LLaMat-CIF variant demonstrates unprecedented capabilities in crystal structure generation, predicting stable crystals with high coverage across the periodic table. Intriguingly, despite LLaMA-3's superior performance in comparison to LLaMA-2, we observe that LLaMat-2 demonstrates unexpectedly enhanced domain-specific performance across diverse materials science tasks, including structured information extraction from text and tables, more particularly in crystal structure generation, a potential adaptation rigidity in overtrained LLMs. Altogether, the present work demonstrates the effectiveness of domain adaptation towards developing practically deployable LLM copilots for materials research. Beyond materials science, our findings reveal important considerations for domain adaptation of LLMs, such as model selection, training methodology, and domain-specific performance, which may influence the development of specialized scientific AI systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.0956

Country:

Asia (0.67)
North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Oil & Gas (0.67)
Education > Curriculum > Subject-Specific Education (0.67)
Materials > Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bilingual Adaptation of Monolingual Foundation Models

Gosal, Gurpreet, Xu, Yishi, Ramakrishnan, Gokul, Joshi, Rituraj, Sheinin, Avraham, Zhiming, null, Chen, null, Mishra, Biswajit, Vassilieva, Natalia, Hestness, Joel, Sengupta, Neha, Sahu, Sunil Kumar, Jia, Bokang, Katipomu, Satheesh, Pandit, Onkar, Kamboj, Samta, Pal, Rahul, Mullah, Parvez, Doraiswamy, Soundar, Chami, Mohamed El Karim

arXiv.org Artificial IntelligenceJul-13-2024

We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pretraining on a bilingual corpus. By continually pretraining on a mix of Arabic and English corpora, the model retains its proficiency in English while acquiring capabilities in Arabic. Our approach results in significant improvements in Arabic and slight enhancements in English, demonstrating cost-effective cross-lingual transfer. We also perform extensive ablations on embedding initialization techniques, data mix ratios, and learning rates and release a detailed training recipe.

large language model, llama 2, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2407.12869

Country:

Europe > Italy (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback