AITopics | Satheesh, Sanjeev

Collaborating Authors

Satheesh, Sanjeev

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

Akter, Syeda Nahida, Prabhumoye, Shrimai, Kamalu, John, Satheesh, Sanjeev, Nyberg, Eric, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan

arXiv.org Artificial IntelligenceOct-15-2024

The utility of synthetic data to enhance pretraining data quality and hence to improve downstream task accuracy has been widely explored in recent large language models (LLMs). Yet, these approaches fall inadequate in complex, multi-hop and mathematical reasoning tasks as the synthetic data typically fails to add complementary knowledge to the existing raw corpus. In this work, we propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method that improves the mathematical reasoning ability of LLMs. Specifically, using MIND, we generate synthetic conversations based on OpenWebMath (OWM), resulting in a new math corpus, MIND-OWM. Our experiments with different conversational settings reveal that incorporating knowledge gaps between dialog participants is essential for generating high-quality math data. We further identify an effective way to format and integrate synthetic and raw data during pretraining to maximize the gain in mathematical reasoning, emphasizing the need to restructure raw data rather than use it as-is. Compared to pretraining just on raw data, a model pretrained on MIND-OWM shows significant boost in mathematical reasoning (GSM8K: +13.42%, MATH: +2.30%), including superior performance in specialized knowledge (MMLU: +4.55%, MMLU-STEM: +4.28%) and general purpose reasoning tasks (GENERAL REASONING: +2.51%).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.12881

Country:

North America > United States > Minnesota (0.28)
Asia > China (0.28)

Genre:

Research Report > New Finding (0.93)
Personal > Interview (0.67)

Industry:

Education (1.00)
Government > Voting & Elections (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Nemotron-4 340B Technical Report

Nvidia, null, :, null, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen

arXiv.org Artificial IntelligenceJun-17-2024

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.11704

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Education (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.92)
Law (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Active Learning for Speech Recognition: the Power of Gradients

Huang, Jiaji, Child, Rewon, Rao, Vinay, Liu, Hairong, Satheesh, Sanjeev, Coates, Adam

arXiv.org Machine LearningDec-9-2016

In training speech recognition systems, labeling audio clips can be expensive, and not all data is equally valuable. Active learning aims to label only the most informative samples to reduce cost. For speech recognition, confidence scores and other likelihood-based active learning methods have been shown to be effective. Gradient-based active learning methods, however, are still not well-understood. This work investigates the Expected Gradient Length (EGL) approach in active learning for end-to-end speech recognition. We justify EGL from a variance reduction perspective, and observe that EGL's measure of informativeness picks novel samples uncorrelated with confidence scores. Experimentally, we show that EGL can reduce word errors by 11\%, or alternatively, reduce the number of samples to label by 50\%, when compared to random sampling.

artificial intelligence, informativeness, speech recognition, (15 more...)

arXiv.org Machine Learning

1612.03226

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

Deng, Jia, Satheesh, Sanjeev, Berg, Alexander C., Li, Fei

Neural Information Processing SystemsDec-31-2011

We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al.

artificial intelligence, classifier, efficiency, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback