AITopics | Srivastava, Gaurav

Collaborating Authors

Srivastava, Gaurav

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Reasoning Ability of Small Language Models

Srivastava, Gaurav, Cao, Shuxiang, Wang, Xuan

arXiv.org Artificial IntelligenceFeb-17-2025

Reasoning has long been viewed as an emergent property of large language models (LLMs), appearing at or above a certain scale ($\sim$100B parameters). However, recent studies challenge this assumption, showing that small language models (SLMs) can also achieve competitive reasoning performance. SLMs are increasingly favored for their efficiency and deployability. However, there is a lack of systematic study on the reasoning abilities of diverse SLMs, including those trained from scratch or derived from LLMs through quantization, pruning, and distillation. This raises a critical question: Can SLMs achieve reasoning abilities comparable to LLMs? In this work, we systematically survey, benchmark, and analyze 72 SLMs from six model families across 14 reasoning benchmarks. For reliable evaluation, we examine four evaluation methods and compare four LLM judges against human evaluations on 800 data points. We repeat all experiments three times to ensure a robust performance assessment. Additionally, we analyze the impact of different prompting strategies in small models. Beyond accuracy, we also evaluate model robustness under adversarial conditions and intermediate reasoning steps. Our findings challenge the assumption that scaling is the only way to achieve strong reasoning. Instead, we foresee a future where SLMs with strong reasoning capabilities can be developed through structured training or post-training compression. They can serve as efficient alternatives to LLMs for reasoning-intensive tasks.

large language model, machine learning, qwen2, (20 more...)

arXiv.org Artificial Intelligence

2502.11569

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Education > Educational Technology > Educational Software (0.33)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

From Features to Transformers: Redefining Ranking for Scalable Impact

Borisyuk, Fedor, Hertel, Lars, Parameswaran, Ganesh, Srivastava, Gaurav, Ramanujam, Sudarshan Srinivasa, Ocejo, Borja, Du, Peng, Akterskii, Andrei, Daftary, Neil, Tang, Shao, Sun, Daqi, Xiao, Qiang Charles, Nathani, Deepesh, Kothari, Mohit, Dai, Yun, Gupta, Aman

arXiv.org Artificial IntelligenceFeb-5-2025

We present LiGR, a large-scale ranking framework developed at LinkedIn that brings state-of-the-art transformer-based modeling architectures into production. We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items. This architecture enables several breakthrough achievements, including: (1) the deprecation of most manually designed feature engineering, outperforming the prior state-of-the-art system using only few features (compared to hundreds in the baseline), (2) validation of the scaling law for ranking systems, showing improved performance with larger models, more training data, and longer context sequences, and (3) simultaneous joint scoring of items in a set-wise manner, leading to automated improvements in diversity. To enable efficient serving of large ranking models, we describe techniques to scale inference effectively using single-pass processing of user history and set-wise attention. We also summarize key insights from various ablation studies and A/B tests, highlighting the most impactful technical approaches.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.03417

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

LiMAML: Personalization of Deep Recommender Models via Meta Learning

Wang, Ruofan, Prabhakar, Prakruthi, Srivastava, Gaurav, Wang, Tianqi, Jalali, Zeinab S., Bharill, Varun, Ouyang, Yunbo, Nigam, Aastha, Venugopalan, Divya, Gupta, Aman, Borisyuk, Fedor, Keerthi, Sathiya, Muralidharan, Ajith

arXiv.org Artificial IntelligenceFeb-23-2024

In the realm of recommender systems, the ubiquitous adoption of deep neural networks has emerged as a dominant paradigm for modeling diverse business objectives. As user bases continue to expand, the necessity of personalization and frequent model updates have assumed paramount significance to ensure the delivery of relevant and refreshed experiences to a diverse array of members. In this work, we introduce an innovative meta-learning solution tailored to the personalization of models for individual members and other entities, coupled with the frequent updates based on the latest user interaction signals. Specifically, we leverage the Model-Agnostic Meta Learning (MAML) algorithm to adapt per-task sub-networks using recent user interaction data. Given the near infeasibility of productionizing original MAML-based models in online recommendation systems, we propose an efficient strategy to operationalize meta-learned sub-networks in production, which involves transforming them into fixed-sized vectors, termed meta embeddings, thereby enabling the seamless deployment of models with hundreds of billions of parameters for online serving. Through extensive experimentation on production data drawn from various applications at LinkedIn, we demonstrate that the proposed solution consistently outperforms the baseline models of those applications, including strong baselines such as using wide-and-deep ID based personalization approach. Our approach has enabled the deployment of a range of highly personalized AI models across diverse LinkedIn applications, leading to substantial improvements in business metrics as well as refreshed experience for our members.

artificial intelligence, limaml, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2403.00803

Country:

Europe > Spain (0.16)
North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components

Pal, Soumyabrata, Varshney, Prateek, Jain, Prateek, Thakurta, Abhradeep Guha, Madan, Gagan, Aggarwal, Gaurav, Shenoy, Pradeep, Srivastava, Gaurav

arXiv.org Machine LearningSep-5-2023

Personalization of machine learning (ML) predictions for individual users/domains/enterprises is critical for practical recommendation systems. Standard personalization approaches involve learning a user/domain specific embedding that is fed into a fixed global model which can be limiting. On the other hand, personalizing/fine-tuning model itself for each user/domain -- a.k.a meta-learning -- has high storage/infrastructure cost. Moreover, rigorous theoretical studies of scalable personalization approaches have been very limited. To address the above issues, we propose a novel meta-learning style approach that models network weights as a sum of low-rank and sparse components. This captures common information from multiple individuals/users together in the low-rank part while sparse part captures user-specific idiosyncrasies. We then study the framework in the linear setting, where the problem reduces to that of estimating the sum of a rank-$r$ and a $k$-column sparse matrix using a small number of linear measurements. We propose a computationally efficient alternating minimization method with iterative hard thresholding -- AMHT-LRS -- to learn the low-rank and sparse part. Theoretically, for the realizable Gaussian data setting, we show that AMHT-LRS solves the problem efficiently with nearly optimal sample complexity. Finally, a significant challenge in personalization is ensuring privacy of each user's sensitive data. We alleviate this problem by proposing a differentially private variant of our method that also is equipped with strong generalization guarantees.

artificial intelligence, machine learning, null, (16 more...)

arXiv.org Machine Learning

2210.03505

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multi-view Sparse Laplacian Eigenmaps for nonlinear Spectral Feature Selection

Srivastava, Gaurav, Jangid, Mahesh

arXiv.org Artificial IntelligenceJul-29-2023

The complexity of high-dimensional datasets presents significant challenges for machine learning models, including overfitting, computational complexity, and difficulties in interpreting results. To address these challenges, it is essential to identify an informative subset of features that captures the essential structure of the data. In this study, the authors propose Multi-view Sparse Laplacian Eigenmaps (MSLE) for feature selection, which effectively combines multiple views of the data, enforces sparsity constraints, and employs a scalable optimization algorithm to identify a subset of features that capture the fundamental data structure. MSLE is a graph-based approach that leverages multiple views of the data to construct a more robust and informative representation of high-dimensional data. The method applies sparse eigendecomposition to reduce the dimensionality of the data, yielding a reduced feature set. The optimization problem is solved using an iterative algorithm alternating between updating the sparse coefficients and the Laplacian graph matrix. The sparse coefficients are updated using a soft-thresholding operator, while the graph Laplacian matrix is updated using the normalized graph Laplacian. To evaluate the performance of the MSLE technique, the authors conducted experiments on the UCI-HAR dataset, which comprises 561 features, and reduced the feature space by 10 to 90%. Our results demonstrate that even after reducing the feature space by 90%, the Support Vector Machine (SVM) maintains an error rate of 2.72%. Moreover, the authors observe that the SVM exhibits an accuracy of 96.69% with an 80% reduction in the overall feature space.

artificial intelligence, laplacian eigenmap, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.15905

Country: Asia (0.47)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

An Analysis of Random Projections in Cancelable Biometrics

Arpit, Devansh, Nwogu, Ifeoma, Srivastava, Gaurav, Govindaraju, Venu

arXiv.org Machine LearningNov-13-2014

With increasing concerns about security, the need for highly secure physical biometrics-based authentication systems utilizing \emph{cancelable biometric} technologies is on the rise. Because the problem of cancelable template generation deals with the trade-off between template security and matching performance, many state-of-the-art algorithms successful in generating high quality cancelable biometrics all have random projection as one of their early processing steps. This paper therefore presents a formal analysis of why random projections is an essential step in cancelable biometrics. By formally defining the notion of an \textit{Independent Subspace Structure} for datasets, it can be shown that random projection preserves the subspace structure of data vectors generated from a union of independent linear subspaces. The bound on the minimum number of random vectors required for this to hold is also derived and is shown to depend logarithmically on the number of data samples, not only in independent subspaces but in disjoint subspace settings as well. The theoretical analysis presented is supported in detail with empirical results on real-world face recognition datasets.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1401.4489

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (0.96)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback