AITopics | Gupta, Vineet

Collaborating Authors

Gupta, Vineet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Computationally Efficient Sparsified Online Newton Method

Devvrit, Fnu, Duvvuri, Sai Surya, Anil, Rohan, Gupta, Vineet, Hsieh, Cho-Jui, Dhillon, Inderjit

arXiv.org Artificial IntelligenceNov-16-2023

Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner. The algorithm emerges from a novel use of the LogDet matrix divergence measure; we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Empirically, we test our method on large scale benchmarks of up to 1B parameters. We achieve up to 30% faster convergence, 3.4% relative improvement in validation performance, and 80% relative improvement in training loss, in comparison to memory efficient optimizers including first order methods. Powering the method is a surprising fact -- imposing structured sparsity patterns, like tridiagonal and banded structure, requires little to no overhead, making it as efficient and parallelizable as first-order methods. In wall-clock time, tridiagonal SONew is only about 3% slower per step than first-order methods but gives overall gains due to much faster convergence. In contrast, one of the state-of-the-art (SOTA) memory-intensive second-order methods, Shampoo, is unable to scale to large benchmarks. Additionally, while Shampoo necessitates significant engineering efforts to scale to large benchmarks, SONew offers a more straightforward implementation, increasing its practical appeal. SONew code is available at: https://github.com/devvrit/SONew

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2311.10085

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Using Foundation Models to Detect Policy Violations with Minimal Supervision

Mittal, Sid, Gupta, Vineet, Liu, Frederick, Sundararajan, Mukund

arXiv.org Artificial IntelligenceJun-9-2023

Foundation models, i.e. large neural networks pre-trained on large text corpora, have revolutionized NLP. They can be instructed directly (e.g. (arXiv:2005.14165)) - this is called hard prompting - and they can be tuned using very little data (e.g. (arXiv:2104.08691)) - this technique is called soft prompting. We seek to leverage their capabilities to detect policy violations. Our contributions are: We identify a hard prompt that adapts chain-of-thought prompting to policy violation tasks. This prompt produces policy violation classifications, along with extractive explanations that justify the classification. We compose the hard-prompts with soft prompt tuning to produce a classifier that attains high accuracy with very little supervision; the same classifier also produces explanations. Though the supervision only acts on the classifications, we find that the modified explanations remain consistent with the (tuned) model's response. Along the way, we identify several unintuitive aspects of foundation models. For instance, adding an example from a specific class can actually reduce predictions of that class, and separately, the effects of tokenization on scoring etc. Based on our technical results, we identify a simple workflow for product teams to quickly develop effective policy violation detectors.

human rating, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2306.06234

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Industry: Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Add feedback

Memory-Efficient Adaptive Optimization for Large-Scale Learning

Anil, Rohan, Gupta, Vineet, Koren, Tomer, Singer, Yoram

arXiv.org Machine LearningJan-30-2019

Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training very large deep models.

algorithm, artificial intelligence, machine translation, (19 more...)

arXiv.org Machine Learning

1901.1115

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

The Singular Values of Convolutional Layers

Sedghi, Hanie, Gupta, Vineet, Long, Philip M.

arXiv.org Artificial IntelligenceMay-25-2018

We characterize the singular values of the linear transformation associated with a convolution applied to a two-dimensional feature map with multiple channels. Our characterization enables efficient computation of the singular values of convolutional layers used in popular deep neural network architectures. It also leads to an algorithm for projecting a convolutional layer onto the set of layers obeying a bound on the operator norm of the layer. We show that this is an effective regularizer; periodically applying these projections during training improves the test error of a residual network on CIFAR-10 from 6.2\% to 5.3\%.

deep learning, neural network, singular value, (17 more...)

arXiv.org Artificial Intelligence

1805.10408

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Shampoo: Preconditioned Stochastic Tensor Optimization

Gupta, Vineet, Koren, Tomer, Singer, Yoram

arXiv.org Machine LearningMar-1-2018

Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Although it involves a more complex update rule, Shampoo's runtime per step is comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam.

deep learning, matrix, neural network, (18 more...)

arXiv.org Machine Learning

1802.09568

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

Gupta, Vineet, Koren, Tomer, Singer, Yoram

arXiv.org Machine LearningJun-20-2017

We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning. Such algorithms have been proven useful in stochastic optimization by reshaping the gradients according to the geometry of the data. Our framework captures and unifies much of the existing literature on adaptive online methods, including the AdaGrad and Online Newton Step algorithms as well as their diagonal versions. As a result, we obtain new convergence proofs for these algorithms that are substantially simpler than previous analyses. Our framework also exposes the rationale for the different preconditioned updates used in common stochastic optimization methods.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1706.06569

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Communicating Semantics: Reference by Description

Guha, Ramanathan V, Gupta, Vineet

arXiv.org Artificial IntelligenceMar-7-2016

Messages often refer to entities such as people, places and events. Correct identification of the intended reference is an essential part of communication. Lack of shared unique names often complicates entity reference. Shared knowledge can be used to construct uniquely identifying descriptive references for entities with ambiguous names. We introduce a mathematical model for `Reference by Description', derive results on the conditions under which, with high probability, programs can construct unambiguous references to most entities in the domain of discourse and provide empirical validation of these results.

logic programming, node, us government, (22 more...)

arXiv.org Artificial Intelligence

1511.06341

Country: North America > United States (0.93)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.92)
Media > Television (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.93)
(3 more...)

Add feedback

Identifying Purchase Intent from Social Posts

Gupta, Vineet (Adobe Research India Labs, Adobe Systems) | Varshney, Devesh (Indian Institute of Technology Roorkee) | Jhamtani, Harsh (Indian Institute of Technology Roorkee) | Kedia, Deepam (Indian Institute of Technology Kanpur) | Karwa, Shweta (Indian Institute of Technology Delhi)

AAAI ConferencesMar-23-2014

In present times, social forums such as Quora and Yahoo! Answers constitute powerful media through which people discuss on a variety of topics and express their intentions and thoughts. Here they often reveal their potential intent to purchase - 'Purchase Intent' (PI). A purchase intent is defined as a text expression showing a desire to purchase a product or a service in future. Extracting posts having PI from a user's social posts gives huge opportunities towards web personalization, targeted marketing and improving community observing systems. In this paper, we explore the novel problem of detecting PIs from social posts and classifying them. We find that using linguistic features along with statistical features of PI expressions achieves a significant improvement in PI classification over 'bag-of-words' based features used in many present day social-media classification tasks. Our approach takes into consideration the specifics of social posts like limited contextual information, incorrect grammar, language ambiguities, etc. by extracting features at two different levels of text granularity - word and phrase based features and grammatical dependency based features. Apart from these, the patterns observed in PI posts help us to identify some specific features.

identifying purchase intent, social post

AAAI Conferences

Eighth International AAAI Conference on Weblogs and Social Media

Technology:

Information Technology > Communications > Social Media (0.53)
Information Technology > Artificial Intelligence (0.53)

Add feedback