AITopics | Varshney, Prateek

Collaborating Authors

Varshney, Prateek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD

Das, Aniket, Nagaraj, Dheeraj, Pal, Soumyabrata, Suggala, Arun, Varshney, Prateek

arXiv.org Machine LearningOct-26-2024

We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. More precisely, with $T$ samples, we show that Clipped-SGD, for smooth and strongly convex objectives, achieves an error of $\sqrt{\frac{\mathsf{Tr}(\Sigma)+\sqrt{\mathsf{Tr}(\Sigma)\|\Sigma\|_2}\log(\frac{\log(T)}{\delta})}{T}}$ with probability $1-\delta$, where $\Sigma$ is the covariance of the clipped gradient. Note that the fluctuations (depending on $\frac{1}{\delta}$) are of lower order than the term $\mathsf{Tr}(\Sigma)$. This improves upon the current best rate of $\sqrt{\frac{\mathsf{Tr}(\Sigma)\log(\frac{1}{\delta})}{T}}$ for Clipped-SGD, known only for smooth and strongly convex objectives. Our results also extend to smooth convex and lipschitz convex objectives. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of Catoni and Giulini.

artificial intelligence, inequality, machine learning, (19 more...)

arXiv.org Machine Learning

2410.20135

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization

Varshney, Prateek, Pilanci, Mert

arXiv.org Artificial IntelligenceOct-9-2024

Deploying large and complex deep neural networks on resource-constrained edge devices poses significant challenges due to their computational demands and the complexities of non-convex optimization. Traditional compression methods such as distillation and pruning often retain non-convexity that complicates fine-tuning in real-time on such devices. Moreover, these methods often necessitate extensive end-to-end network fine-tuning after compression to preserve model performance, which is not only time-consuming but also requires fully annotated datasets, thus potentially negating the benefits of efficient network compression. In this paper, we introduce a novel distillation technique that efficiently compresses the model via convex optimization -- eliminating intermediate non-convex activation functions and using only intermediate activations from the original model. Our approach enables distillation in a label-free data setting and achieves performance comparable to the original model without requiring any post-compression fine-tuning. We demonstrate the effectiveness of our method for image classification models on multiple standard datasets, and further show that in the data limited regime, our method can outperform standard non-convex distillation approaches. Our method promises significant advantages for deploying high-efficiency, low-footprint models on edge devices, making it a practical choice for real-world applications. We show that convex neural networks, when provided with rich feature representations from a large pre-trained non-convex model, can achieve performance comparable to their non-convex counterparts, opening up avenues for future research at the intersection of convex optimization and deep learning.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2410.06567

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components

Pal, Soumyabrata, Varshney, Prateek, Jain, Prateek, Thakurta, Abhradeep Guha, Madan, Gagan, Aggarwal, Gaurav, Shenoy, Pradeep, Srivastava, Gaurav

arXiv.org Machine LearningSep-5-2023

Personalization of machine learning (ML) predictions for individual users/domains/enterprises is critical for practical recommendation systems. Standard personalization approaches involve learning a user/domain specific embedding that is fed into a fixed global model which can be limiting. On the other hand, personalizing/fine-tuning model itself for each user/domain -- a.k.a meta-learning -- has high storage/infrastructure cost. Moreover, rigorous theoretical studies of scalable personalization approaches have been very limited. To address the above issues, we propose a novel meta-learning style approach that models network weights as a sum of low-rank and sparse components. This captures common information from multiple individuals/users together in the low-rank part while sparse part captures user-specific idiosyncrasies. We then study the framework in the linear setting, where the problem reduces to that of estimating the sum of a rank-$r$ and a $k$-column sparse matrix using a small number of linear measurements. We propose a computationally efficient alternating minimization method with iterative hard thresholding -- AMHT-LRS -- to learn the low-rank and sparse part. Theoretically, for the realizable Gaussian data setting, we show that AMHT-LRS solves the problem efficiently with nearly optimal sample complexity. Finally, a significant challenge in personalization is ensuring privacy of each user's sensitive data. We alleviate this problem by proposing a differentially private variant of our method that also is equipped with strong generalization guarantees.

artificial intelligence, machine learning, null, (16 more...)

arXiv.org Machine Learning

2210.03505

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback