AITopics | Kusupati, Aditya

Collaborating Authors

Kusupati, Aditya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Matryoshka Quantization

Nair, Pranav, Datta, Puranjay, Dean, Jeff, Jain, Prateek, Kusupati, Aditya

arXiv.org Artificial IntelligenceFeb-10-2025

Quantizing model weights is critical for reducing the communication and inference costs of large models. However, quantizing models -- especially to low precisions like int4 or int2 -- requires a trade-off in model quality; int2, in particular, is known to severely degrade model quality. Consequently, practitioners are often forced to maintain multiple models with different quantization levels or serve a single model that best satisfies the quality-latency trade-off. On the other hand, integer data types, such as int8, inherently possess a nested (Matryoshka) structure where smaller bit-width integers, like int4 or int2, are nested within the most significant bits. This paper proposes Matryoshka Quantization (MatQuant), a novel multi-scale quantization technique that addresses the challenge of needing multiple quantized models. It allows training and maintaining just one model, which can then be served at different precision levels. Furthermore, due to the co-training and co-distillation regularization provided by MatQuant, the int2 precision models extracted by MatQuant can be up to $10\%$ more accurate than standard int2 quantization (using techniques like QAT or OmniQuant). This represents significant progress in model quantization, demonstrated by the fact that, with the same recipe, an int2 FFN-quantized Gemma-2 9B model is more accurate than an int8 FFN-quantized Gemma-2 2B model.

large language model, machine learning, matquant, (17 more...)

arXiv.org Artificial Intelligence

2502.06786

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos

Wallingford, Matthew, Bhattad, Anand, Kusupati, Aditya, Ramanujan, Vivek, Deitke, Matt, Kakade, Sham, Kembhavi, Aniruddha, Mottaghi, Roozbeh, Ma, Wei-Chiu, Farhadi, Ali

arXiv.org Artificial IntelligenceDec-10-2024

Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world and has been an active area of research in computer vision, graphics, and robotics. Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects. However, applying a similar approach to real-world objects and scenes is difficult due to a lack of large-scale data. Videos are a potential source for real-world 3D data, but finding diverse yet corresponding views of the same content has shown to be difficult at scale. Furthermore, standard videos come with fixed viewpoints, determined at the time of capture. This restricts the ability to access scenes from a variety of more diverse and potentially useful perspectives. We argue that large scale 360 videos can address these limitations to provide: scalable corresponding frames from diverse views. In this paper, we introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale. We train our diffusion-based model, Odin, on 360-1M. Empowered by the largest real-world, multi-view dataset to date, Odin is able to freely generate novel views of real-world scenes. Unlike previous methods, Odin can move the camera through the environment, enabling the model to infer the geometry and layout of the scene. Additionally, we show improved performance on standard novel view synthesis and 3D reconstruction benchmarks.

artificial intelligence, machine learning, video, (13 more...)

arXiv.org Artificial Intelligence

2412.0777

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.89)

Add feedback

MatMamba: A Matryoshka State Space Model

Shukla, Abhinav, Vemprala, Sai, Kusupati, Aditya, Kapoor, Ashish

arXiv.org Artificial IntelligenceOct-9-2024

State Space Models (SSMs) like Mamba2 are a promising alternative to Transformers, with faster theoretical training and inference times -- especially for long context lengths. Recent work on Matryoshka Representation Learning -- and its application to Transformer backbones in works like MatFormer -- showed how to introduce nested granularities of smaller submodels in one universal elastic model. In this work, we present MatMamba: a state space model which combines Matryoshka-style learning with Mamba2, by modifying the block to contain nested dimensions to enable joint training and adaptive inference. MatMamba allows for efficient and adaptive deployment across various model sizes. We train a single large MatMamba model and are able to get a number of smaller nested models for free -- while maintaining or improving upon the performance of a baseline smaller model trained from scratch. We train language and image models at a variety of parameter sizes from 35M to 1.4B. Our results on ImageNet and FineWeb show that MatMamba models scale comparably to Transformers, while having more efficient inference characteristics. This makes MatMamba a practically viable option for deploying large-scale models in an elastic way based on the available inference compute. Code and models are open sourced at \url{https://github.com/ScaledFoundations/MatMamba}

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.06718

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Shen, Ethan, Fan, Alan, Pratt, Sarah M., Park, Jae Sung, Wallingford, Matthew, Kakade, Sham M., Holtzman, Ari, Krishna, Ranjay, Farhadi, Ali, Kusupati, Aditya

arXiv.org Artificial IntelligenceJun-24-2024

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To alleviate the computation cost of running $k$ inference passes, we propose Superposed Decoding, a new decoding algorithm that generates $k$ drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model. At every inference step we combine the $k$ drafts with the top-$k$ tokens to get $k^2$ new drafts and cache the $k$ most likely options, using an n-gram interpolation with minimal compute overhead to filter out incoherent generations. Our experiments show that $k$ drafts from Superposed Decoding are at least as coherent and factual as Nucleus Sampling and Greedy Decoding respectively, while being at least $2.44\times$ faster for $k\ge3$. In a compute-normalized setting, user evaluations demonstrably favor text generated by Superposed Decoding over Nucleus Sampling. Code and more examples open-sourced at https://github.com/RAIVNLab/SuperposedDecoding.

large language model, machine learning, superposed decoding, (17 more...)

arXiv.org Artificial Intelligence

2405.184

Country:

North America > United States > California (0.14)
North America > Mexico > Veracruz (0.14)
Asia > India > Maharashtra (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Education (0.69)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Gecko: Versatile Text Embeddings Distilled from Large Language Models

Lee, Jinhyuk, Dai, Zhuyun, Ren, Xiaoqi, Chen, Blair, Cer, Daniel, Cole, Jeremy R., Hui, Kai, Boratko, Michael, Kapadia, Rajvi, Ding, Wen, Luan, Yi, Duddu, Sai Meher Karthik, Abrego, Gustavo Hernandez, Shi, Weiqiang, Gupta, Nithi, Kusupati, Aditya, Jain, Prateek, Jonnalagadda, Siddhartha Reddy, Chang, Ming-Wei, Naim, Iftekhar

arXiv.org Artificial IntelligenceMar-29-2024

Text embedding models represent natural language as dense vectors, positioning semantically similar text near each other within the embedding space (Gao et al., 2021; Le and Mikolov, 2014; Reimers and Gurevych, 2019). These embeddings are commonly used for a wide range of downstream tasks including document retrieval, sentence similarity, classification, and clustering (Muennighoff et al., 2023). Instead of building separate embedding models for each downstream task, recent efforts seek to create a single embedding model supporting many tasks. The recent development of general-purpose text embedding models presents a challenge: these models require large amounts of training data to comprehensively cover desired domains and skills. Recent embedding efforts have focused on using extensive collections of training examples (Li et al., 2023; Wang et al., 2022).

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.20327

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports > Olympic Games (0.68)
Media > Film (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Neural Priming for Sample-Efficient Adaptation

Wallingford, Matthew, Ramanujan, Vivek, Fang, Alex, Kusupati, Aditya, Mottaghi, Roozbeh, Kembhavi, Aniruddha, Schmidt, Ludwig, Farhadi, Ali

arXiv.org Artificial IntelligenceDec-4-2023

We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at test time, even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks. Concretely, in the zero-shot setting, we see a 2.45% improvement in accuracy on ImageNet and 3.81% accuracy improvement on average across standard transfer learning benchmarks. Further, using Neural Priming at inference to adapt to distribution shift, we see a 1.41% accuracy improvement on ImageNetV2. These results demonstrate the effectiveness of Neural Priming in addressing the challenge of limited labeled data and changing distributions. Code is available at github.com/RAIVNLab/neural-priming.

large language model, machine learning, neural priming, (18 more...)

arXiv.org Artificial Intelligence

2306.10191

Country:

North America > United States > Maryland (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.55)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Are "Hierarchical" Visual Representations Hierarchical?

Shen, Ethan, Farhadi, Ali, Kusupati, Aditya

arXiv.org Artificial IntelligenceNov-23-2023

Learned visual representations often capture large amounts of semantic information for accurate downstream applications. Human understanding of the world is fundamentally grounded in hierarchy. To mimic this and further improve representation capabilities, the community has explored "hierarchical" visual representations that aim at modeling the underlying hierarchy of the visual world. In this work, we set out to investigate if hierarchical visual representations truly capture the human perceived hierarchy better than standard learned representations. To this end, we create HierNet, a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet. After extensive evaluation of Hyperbolic and Matryoshka Representations across training setups, we conclude that they do not capture hierarchy any better than the standard representations but can assist in other aspects like search efficiency and interpretability. Our benchmark and the datasets are open-sourced at https://github.com/ethanlshen/HierNet.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2311.05784

Country: North America (0.14)

Genre: Research Report (0.40)

Industry: Transportation > Ground (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks

Salehi, Mohammadreza, Mehta, Sachin, Kusupati, Aditya, Farhadi, Ali, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceOct-18-2023

We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes across different architectures and can be even applied to compressed and efficient transformer encoders to further improve their efficiency; (3) SHARCS can provide a 2 times inference speed up at an insignificant drop in accuracy.

artificial intelligence, efficient transformer, sub-network, (2 more...)

arXiv.org Artificial Intelligence

2310.12126

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

AdANNS: A Framework for Adaptive Semantic Search

Rege, Aniket, Kusupati, Aditya, S, Sharan Ranjit, Fan, Alan, Cao, Qingqing, Kakade, Sham, Jain, Prateek, Farhadi, Ali

arXiv.org Artificial IntelligenceOct-18-2023

Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.

adaptive semantic search, artificial intelligence, framework, (1 more...)

arXiv.org Artificial Intelligence

2305.19435

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)
Information Technology > Information Management > Search (0.40)

Add feedback

EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval

Kumar, Ramnath, Mittal, Anshul, Gupta, Nilesh, Kusupati, Aditya, Dhillon, Inderjit, Jain, Prateek

arXiv.org Artificial IntelligenceOct-13-2023

Dense embedding-based retrieval is now the industry standard for semantic search and ranking problems, like obtaining relevant web documents for a given query. Such techniques use a two-stage process: (a) contrastive learning to train a dual encoder to embed both the query and documents and (b) approximate nearest neighbor search (ANNS) for finding similar documents for a given query. These two stages are disjoint; the learned embeddings might be ill-suited for the ANNS method and vice-versa, leading to suboptimal performance. In this work, we propose End-to-end Hierarchical Indexing -- EHI -- that jointly learns both the embeddings and the ANNS structure to optimize retrieval performance. EHI uses a standard dual encoder model for embedding queries and documents while learning an inverted file index (IVF) style tree structure for efficient ANNS. To ensure stable and efficient learning of discrete tree-based ANNS structure, EHI introduces the notion of dense path embedding that captures the position of a query/document in the tree. We demonstrate the effectiveness of EHI on several benchmarks, including de-facto industry standard MS MARCO (Dev set and TREC DL19) datasets. For example, with the same compute budget, EHI outperforms state-of-the-art (SOTA) in by 0.6% (MRR@10) on MS MARCO dev set and by 4.2% (nDCG@10) on TREC DL19 benchmarks.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.08891

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback