AITopics | Suneja, Sahil

Collaborating Authors

Suneja, Sahil

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating Production LLMs with Combined Token/Embedding Speculators

Wertheimer, Davis, Rosenkranz, Joshua, Parnell, Thomas, Suneja, Sahil, Ranganathan, Pavithra, Ganti, Raghu, Srivatsa, Mudhakar

arXiv.org Artificial IntelligenceJun-6-2024

One approach to squaring this circle is speculative decoding, where a smaller draft model or speculator is trained This technical report describes the design and training to predict multiple tokens given a sequence of input. These of novel speculative decoding draft models, for accelerating speculative tokens are produced with low cost, and lower the inference speeds of large language models in a accuracy than the base LLM. However, we can leverage production environment. By conditioning draft predictions GPU parallelism during the LLM forward pass to evaluate on both context vectors and sampled tokens, we can train the output for each of these new tokens with minimal additional our speculators to efficiently predict high-quality n-grams, overhead. Then, by comparing the outputs to the which the base model then accepts or rejects. This allows us speculated inputs, we can accept all the predicted tokens to effectively predict multiple tokens per inference forward that match the output of the base model, while rejecting all pass, accelerating wall-clock inference speeds of highly optimized those that don't. In this way we can predict multiple tokens base model implementations by a factor of 2-3x. We per LLM forward pass at minimal extra cost. A deeper explore these initial results and describe next steps for further explanation of speculative decoding can be found in [3, 6].

large language model, natural language, speculator, (18 more...)

arXiv.org Artificial Intelligence

2404.19124

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Automated Code generation for Information Technology Tasks in YAML through Large Language Models

Pujar, Saurabh, Buratti, Luca, Guo, Xiaojie, Dupuis, Nicolas, Lewis, Burn, Suneja, Sahil, Sood, Atin, Nalawade, Ganesh, Jones, Matthew, Morari, Alessandro, Puri, Ruchir

arXiv.org Artificial IntelligenceMay-23-2023

The recent improvement in code generation capabilities due to the use of large language models has mainly benefited general purpose programming languages. Domain specific languages, such as the ones used for IT Automation, have received far less attention, despite involving many active developers and being an essential component of modern cloud platforms. This work focuses on the generation of Ansible-YAML, a widely used markup language for IT Automation. We present Ansible Wisdom, a natural-language to Ansible-YAML code generation tool, aimed at improving IT automation productivity. Ansible Wisdom is a transformer-based model, extended by training with a new dataset containing Ansible-YAML. We also develop two novel performance metrics for YAML and Ansible to capture the specific characteristics of this domain. Results show that Ansible Wisdom can accurately generate Ansible script from natural language prompts with performance comparable or better than existing state of the art code generation models. In few-shot settings we asses the impact of training with Ansible, YAML data and compare with different baselines including Codex-Davinci-002. We also show that after finetuning, our Ansible specific model (BLEU: 66.67) can outperform a much larger Codex-Davinci-002 (BLEU: 50.4) model, which was evaluated in few shot settings.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.02783

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (0.88)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Study of Distractors in Neural Models of Code

Rabin, Md Rafiqul Islam, Hussain, Aftab, Suneja, Sahil, Alipour, Mohammad Amin

arXiv.org Artificial IntelligenceMar-3-2023

Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction. Understanding distractors provide a complementary view of the features' relevance in the predictions of neural models. In this paper, we apply a reduction-based technique to find distractors and provide our preliminary results of their impacts and types. Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions and the categories of tokens can also play a vital role in the model's confidence. Our study aims to enhance the transparency of models by emphasizing those tokens that significantly influence the confidence of the models.

machine learning, natural language, prediction, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/InteNSE59150.2023.00005

2303.01739

Country: North America > United States > Texas > Harris County > Houston (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Data-Driven AI Model Signal-Awareness Enhancement and Introspection

Suneja, Sahil, Zhuang, Yufan, Zheng, Yunhui, Laredo, Jim, Morari, Alessandro

arXiv.org Artificial IntelligenceJan-7-2022

AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related aspects of source code, are being raised. While recent model-probing approaches have observed a lack of signal awareness in many AI-for-code models, i.e. models not capturing task-relevant signals, they do not offer solutions to rectify this problem. In this paper, we explore data-driven approaches to enhance models' signal-awareness: 1) we combine the SE concept of code complexity with the AI technique of curriculum learning; 2) we incorporate SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset. With our techniques, we achieve up to 4.8x improvement in model signal awareness. Using the notion of code complexity, we further present a novel model learning introspection approach from the perspective of the dataset.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2111.05827

Country:

North America > United States > South Carolina (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.48)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Zhuang, Yufan, Suneja, Sahil, Thost, Veronika, Domeniconi, Giacomo, Morari, Alessandro, Laredo, Jim

arXiv.org Artificial IntelligenceSep-7-2021

Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.

deep learning, neural network, representation, (19 more...)

arXiv.org Artificial Intelligence

2109.03341

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Probing Model Signal-Awareness via Prediction-Preserving Input Minimization

Zheng, Yunhui, Suneja, Sahil, Zhuang, Yufan, Morari, Alessandro, Laredo, Jim

arXiv.org Artificial IntelligenceNov-25-2020

This work explores the signal awareness of AI models for source code understanding. Using a software vulnerability detection use-case, we evaluate the models' ability to capture the correct vulnerability signals to produce their predictions. Our prediction-preserving input minimization (P2IM) approach systematically reduces the original source code to a minimal snippet which a model needs to maintain its prediction. The model's reliance on incorrect signals is then uncovered when a vulnerability in the original code is missing in the minimal snippet, both of which the model however predicts as being vulnerable. We apply P2IM on three state-of-the-art neural network models across multiple datasets, and measure their signal awareness using a new metric we propose- Signal-aware Recall (SAR). The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric, highlighting that the models are presumably picking up a lot of noise or dataset nuances while learning their vulnerability detection logic.

deep learning, neural network, prediction, (18 more...)

arXiv.org Artificial Intelligence

2011.14934

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback