AITopics | Singh, Vaibhav

Collaborating Authors

Singh, Vaibhav

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

Singh, Vaibhav, Janson, Paul, Mehrbod, Paria, Ibrahim, Adam, Rish, Irina, Belilovsky, Eugene, Thérien, Benjamin

arXiv.org Artificial IntelligenceMar-5-2025

The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. While self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful representations from vast amounts of unlabeled data, existing methods still struggle to adapt to the non-stationary, non-IID nature of real-world data streams without forgetting previously learned knowledge. Recent works have adopted a repeated cosine annealing schedule for large-scale continual pre-training; however, these schedules (1) inherently cause forgetting during the re-warming phase and (2) have not been systematically compared to existing continual SSL methods. In this work, we systematically compare the widely used cosine schedule with the recently proposed infinite learning rate schedule and empirically find the latter to be a more effective alternative. Our extensive empirical evaluation across diverse image and language datasets demonstrates that the infinite learning rate schedule consistently enhances continual pre-training performance compared to a repeated cosine decay without being restricted to a fixed iteration budget. For instance, in a small-scale MAE pre-training setup, it outperforms several strong baselines from the literature. Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks. These models are known for their massive parameter counts and extensive training on vast amounts of data, often developing impressive general-purpose capabilities unexpectedly during pre-training (Brown et al., 2020; Wei et al., 2022). While foundation models have demonstrated remarkable success on static tasks, adapting them to evolving data--such as the continuous influx of new textual information (Soldaini et al., 2024; Li et al., 2024; Abadji et al., 2022; Kocetkov et al., 2022) and the emergence of novel visual concepts (Prabhu et al., 2023; Seo et al., 2024)--remains a major challenge.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.02844

Country:

North America > Canada > Quebec (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.87)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ARISE: Iterative Rule Induction and Synthetic Data Generation for Text Classification

M., Yashwanth, Singh, Vaibhav, Maheshwari, Ayush, Krishna, Amrith, Ramakrishnan, Ganesh

arXiv.org Artificial IntelligenceFeb-9-2025

We propose ARISE, a framework that iteratively induces rules and generates synthetic data for text classification. We combine synthetic data generation and automatic rule induction, via bootstrapping, to iteratively filter the generated rules and data. We induce rules via inductive generalisation of syntactic n-grams, enabling us to capture a complementary source of supervision. These rules alone lead to performance gains in both, in-context learning (ICL) and fine-tuning (FT) settings. Similarly, use of augmented data from ARISE alone improves the performance for a model, outperforming configurations that rely on complex methods like contrastive learning. Further, our extensive experiments on various datasets covering three full-shot, eight few-shot and seven multilingual variant settings demonstrate that the rules and data we generate lead to performance improvements across these diverse domains and languages.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.05923

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Massachusetts (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
(2 more...)

Add feedback

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Ghosal, Soumya Suvra, Chakraborty, Souradip, Singh, Vaibhav, Guan, Tianrui, Wang, Mengdi, Beirami, Ahmad, Huang, Furong, Velasquez, Alvaro, Manocha, Dinesh, Bedi, Amrit Singh

arXiv.org Artificial IntelligenceDec-20-2024

With the widespread deployment of Multimodal Large Language Models (MLLMs) for visual-reasoning tasks, improving their safety has become crucial. Recent research indicates that despite training-time safety alignment, these models remain vulnerable to jailbreak attacks. In this work, we first highlight an important safety gap to describe that alignment achieved solely through safety training may be insufficient against jailbreak attacks. To address this vulnerability, we propose Immune, an inference-time defense framework that leverages a safe reward model through controlled decoding to defend against jailbreak attacks. Additionally, we provide a mathematical characterization of Immune, offering provable guarantees against jailbreaks. Extensive evaluations on diverse jailbreak benchmarks using recent MLLMs reveal that Immune effectively enhances model safety while preserving the model's original capabilities. For instance, against text-based jailbreak attacks on LLaVA-1.6, Immune reduces the attack success rate by 57.82% and 16.78% compared to the base MLLM and state-of-the-art defense strategy, respectively.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.18688

Country: North America > United States (0.46)

Genre:

Overview (1.00)
Research Report > New Finding (0.65)

Industry:

Information Technology > Security & Privacy (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs

Singh, Vaibhav, Krishna, Amrith, NJ, Karthika, Ramakrishnan, Ganesh

arXiv.org Artificial IntelligenceJun-25-2024

Low-resource languages, by its very definition, tend to be under represented in the pre-training corpora of Large Language Models. In this work, we investigate three low-resource cross-lingual approaches that enable an LLM adapt to tasks in previously unseen languages. Llama-2 is an LLM where Indic languages, among many other language families, contribute to less than $0.005\%$ of the total $2$ trillion token pre-training corpora. In this work, we experiment with the English-dominated Llama-2 for cross-lingual transfer to three Indic languages, Bengali, Hindi, and Tamil as target languages. We study three approaches for cross-lingual transfer, under ICL and fine-tuning. One, we find that adding additional supervisory signals via a dominant language in the LLM, leads to improvements, both under in-context learning and fine-tuning. Two, adapting the target languages to word reordering may be beneficial under ICL, but its impact diminishes with fine tuning. Finally, continued pre-training in one low-resource language can improve model performance for other related low-resource languages.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.17377

Country:

Asia (0.28)
North America > Canada (0.14)
Europe > Denmark (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

A prototype hybrid prediction market for estimating replicability of published work

Chakravorti, Tatiana, Fraleigh, Robert, Fritton, Timothy, McLaughlin, Michael, Singh, Vaibhav, Griffin, Christopher, Kwasnica, Anthony, Pennock, David, Giles, C. Lee, Rajtmajer, Sarah

arXiv.org Artificial IntelligenceMar-1-2023

We present a prototype hybrid prediction market and demonstrate the avenue it represents for meaningful human-AI collaboration. We build on prior work proposing artificial prediction markets as a novel machine-learning algorithm. In an artificial prediction market, trained AI agents buy and sell outcomes of future events. Classification decisions can be framed as outcomes of future events, and accordingly, the price of an asset corresponding to a given classification outcome can be taken as a proxy for the confidence of the system in that decision. By embedding human participants in these markets alongside bot traders, we can bring together insights from both. In this paper, we detail pilot studies with prototype hybrid markets for the prediction of replication study outcomes. We highlight challenges and opportunities, share insights from semi-structured interviews with hybrid market participants, and outline a vision for ongoing and future work.

artificial intelligence, machine learning, participant, (19 more...)

arXiv.org Artificial Intelligence

2303.00866

Country: North America > United States (0.47)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.69)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)

Add feedback

Deep Gaussian Processes with Convolutional Kernels

Kumar, Vinayak, Singh, Vaibhav, Srijith, P. K., Damianou, Andreas

arXiv.org Machine LearningJun-5-2018

Deep Gaussian processes (DGPs) provide a Bayesian non-parametric alternative to standard parametric deep learning models. A DGP is formed by stacking multiple GPs resulting in a well-regularized composition of functions. The Bayesian framework that equips the model with attractive properties, such as implicit capacity control and predictive uncertainty, makes it at the same time challenging to combine with a convolutional structure. This has hindered the application of DGPs in computer vision tasks, an area where deep parametric models (i.e. CNNs) have made breakthroughs. Standard kernels used in DGPs such as radial basis functions (RBFs) are insufficient for handling pixel variability in raw images. In this paper, we build on the recent convolutional GP to develop Convolutional DGP (CDGP) models which effectively capture image level features through the use of convolution kernels, therefore opening up the way for applying DGPs to computer vision tasks. Our model learns local spatial influence and outperforms strong GP based baselines on multi-class image classification. We also consider various constructions of convolution kernel over the image patches, analyze the computational trade-offs and provide an efficient framework for convolutional DGP models. The experimental results on image data such as MNIST, rectangles-image, CIFAR10 and Caltech101 demonstrate the effectiveness of the proposed approaches.

deep learning, kernel, neural network, (19 more...)

arXiv.org Machine Learning

1806.01655

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback