AITopics | Mitra, Subrata

Collaborating Authors

Mitra, Subrata

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation

Agarwal, Shubham, Sundaresan, Sai, Mitra, Subrata, Mahapatra, Debabrata, Gupta, Archit, Sharma, Rounak, Kapu, Nirmal Joshua, Yu, Tong, Saini, Shiv

arXiv.org Artificial IntelligenceFeb-5-2025

Retrieval-Augmented Generation (RAG) is often used with Large Language Models (LLMs) to infuse domain knowledge or user-specific information. In RAG, given a user query, a retriever extracts chunks of relevant text from a knowledge base. These chunks are sent to an LLM as part of the input prompt. Typically, any given chunk is repeatedly retrieved across user questions. However, currently, for every question, attention-layers in LLMs fully compute the key values (KVs) repeatedly for the input chunks, as state-of-the-art methods cannot reuse KV-caches when chunks appear at arbitrary locations with arbitrary contexts. Naive reuse leads to output quality degradation. This leads to potentially redundant computations on expensive GPUs and increases latency. In this work, we propose Cache-Craft, a system for managing and reusing precomputed KVs corresponding to the text chunks (we call chunk-caches) in RAG-based systems. We present how to identify chunk-caches that are reusable, how to efficiently perform a small fraction of recomputation to fix the cache to maintain output quality, and how to efficiently store and evict chunk-caches in the hardware for maximizing reuse while masking any overheads. With real production workloads as well as synthetic datasets, we show that Cache-Craft reduces redundant computation by 51% over SOTA prefix-caching and 75% over full recomputation. Additionally, with continuous batching on a real production workload, we get a 1.6X speed up in throughput and a 2X reduction in end-to-end response latency over prefix-caching while maintaining quality, for both the LLaMA-3-8B and LLaMA-3-70B models.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.15734

Country:

North America > United States (0.46)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System

Agarwal, Shubham, Iqbal, Saud, Mitra, Subrata

arXiv.org Artificial IntelligenceJan-28-2025

Traditional ML models utilize controlled approximations during high loads, employing faster, but less accurate models in a process called accuracy scaling. However, this method is less effective for generative text-to-image models due to their sensitivity to input prompts and performance degradation caused by large model loading overheads. This work introduces a novel text-to-image inference system that optimally matches prompts across multiple instances of the same model operating at various approximation levels to deliver high-quality images under high loads and fixed budgets.

artificial intelligence, machine learning, violation, (14 more...)

arXiv.org Artificial Intelligence

2502.06798

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Personalized Multimodal Large Language Models: A Survey

Wu, Junda, Lyu, Hanjia, Xia, Yu, Zhang, Zhehao, Barrow, Joe, Kumar, Ishita, Mirtaheri, Mehrnoosh, Chen, Hongjie, Rossi, Ryan A., Dernoncourt, Franck, Yu, Tong, Zhang, Ruiyi, Gu, Jiuxiang, Ahmed, Nesreen K., Wang, Yu, Chen, Xiang, Deilamsalehy, Hanieh, Park, Namyong, Kim, Sungchul, Yang, Huanrui, Mitra, Subrata, Hu, Zhengmian, Lipka, Nedim, Nguyen, Dang, Zhao, Yue, Luo, Jiebo, McAuley, Julian

arXiv.org Artificial IntelligenceDec-2-2024

Multimodal Large Language Models (MLLMs) have become increasingly important due to their state-of-the-art performance and ability to integrate multiple data modalities, such as text, images, and audio, to perform complex tasks with high accuracy. This paper presents a comprehensive survey on personalized multimodal large language models, focusing on their architecture, training methods, and applications. We propose an intuitive taxonomy for categorizing the techniques used to personalize MLLMs to individual users, and discuss the techniques accordingly. Furthermore, we discuss how such techniques can be combined or adapted when appropriate, highlighting their advantages and underlying rationale. We also provide a succinct summary of personalization tasks investigated in existing research, along with the evaluation metrics commonly used. Additionally, we summarize the datasets that are useful for benchmarking personalized MLLMs. Finally, we outline critical open challenges. This survey aims to serve as a valuable resource for researchers and practitioners seeking to understand and advance the development of personalized multimodal large language models.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.02142

Country: North America > United States > California (0.28)

Genre: Overview (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ScaleViz: Scaling Visualization Recommendation Models on Large Data

Ahmad, Ghazi Shazan, Agarwal, Shubham, Mitra, Subrata, Rossi, Ryan, Doshi, Manav, Porwal, Vibhor, Paila, Syam Manoj Kumar

arXiv.org Machine LearningNov-27-2024

Automated visualization recommendations (vis-rec) help users to derive crucial insights from new datasets. Typically, such automated vis-rec models first calculate a large number of statistics from the datasets and then use machine-learning models to score or classify multiple visualizations choices to recommend the most effective ones, as per the statistics. However, state-of-the art models rely on very large number of expensive statistics and therefore using such models on large datasets become infeasible due to prohibitively large computational time, limiting the effectiveness of such techniques to most real world complex and large datasets. In this paper, we propose a novel reinforcement-learning (RL) based framework that takes a given vis-rec model and a time-budget from the user and identifies the best set of input statistics that would be most effective while generating the visual insights within a given time budget, using the given model. Using two state-of-the-art vis-rec models applied on three large real-world datasets, we show the effectiveness of our technique in significantly reducing time-to visualize with very small amount of introduced error. Our approach is about 10X times faster compared to the baseline approaches that introduce similar amounts of error.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1007/978-981-97-2262-4_8

2411.18657

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Personalization of Large Language Models: A Survey

Zhang, Zhehao, Rossi, Ryan A., Kveton, Branislav, Shao, Yijia, Yang, Diyi, Zamani, Hamed, Dernoncourt, Franck, Barrow, Joe, Yu, Tong, Kim, Sungchul, Zhang, Ruiyi, Gu, Jiuxiang, Derr, Tyler, Chen, Hongjie, Wu, Junda, Chen, Xiang, Wang, Zichao, Mitra, Subrata, Lipka, Nedim, Ahmed, Nesreen, Wang, Yu

arXiv.org Artificial IntelligenceOct-29-2024

Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. In this work, we bridge the gap between these two separate main directions for the first time by introducing a taxonomy for personalized LLM usage and summarizing the key differences and challenges. We provide a formalization of the foundations of personalized LLMs that consolidates and expands notions of personalization of LLMs, defining and discussing novel facets of personalization, usage, and desiderata of personalized LLMs. We then unify the literature across these diverse fields and usage scenarios by proposing systematic taxonomies for the granularity of personalization, personalization techniques, datasets, evaluation methods, and applications of personalized LLMs. Finally, we highlight challenges and important open problems that remain to be addressed. By unifying and surveying recent research using the proposed taxonomies, we aim to provide a clear guide to the existing literature and different facets of personalization in LLMs, empowering both researchers and practitioners.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.00027

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.27)
North America > United States > Massachusetts (0.27)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.45)
Research Report > Promising Solution (0.45)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Visual Prompting in Multimodal Large Language Models: A Survey

Wu, Junda, Zhang, Zhehao, Xia, Yu, Li, Xintong, Xia, Zhaoyang, Chang, Aaron, Yu, Tong, Kim, Sungchul, Rossi, Ryan A., Zhang, Ruiyi, Mitra, Subrata, Metaxas, Dimitris N., Yao, Lina, Shang, Jingbo, McAuley, Julian

arXiv.org Artificial IntelligenceSep-5-2024

Multimodal large language models (MLLMs) equip pre-trained large-language models (LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied, visual prompting has emerged for more fine-grained and free-form visual instructions. This paper presents the first comprehensive survey on visual prompting methods in MLLMs, focusing on visual prompting, prompt generation, compositional reasoning, and prompt learning. We categorize existing visual prompts and discuss generative methods for automatic prompt annotations on the images. We also examine visual prompting methods that enable better alignment between visual encoders and backbone LLMs, concerning MLLM's visual grounding, object referring, and compositional reasoning abilities. In addition, we provide a summary of model training and in-context learning methods to improve MLLM's perception and understanding of visual prompts. This paper examines visual prompting methods developed in MLLMs and provides a vision of the future of these methods.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.1531

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Analysis of Distributed Deep Learning in the Cloud

Sharma, Aakash, Bhasi, Vivek M., Singh, Sonali, Jain, Rishabh, Gunasekaran, Jashwant Raj, Mitra, Subrata, Kandemir, Mahmut Taylan, Kesidis, George, Das, Chita R.

arXiv.org Artificial IntelligenceDec-22-2022

We aim to resolve this problem by introducing a comprehensive distributed deep learning (DDL) profiler, which can determine the various execution "stalls" that DDL suffers from while running on a public cloud. We have implemented the profiler by extending prior work to additionally estimate two types of communication stalls - interconnect and network stalls. We train popular DNN models using the profiler to characterize various AWS GPU instances and list their advantages and shortcomings for users to make an informed decision. We observe that the more expensive GPU instances may not be the most performant for all DNN models and AWS may sub-optimally allocate hardware interconnect resources. Specifically, the intra-machine interconnect can introduce communication overheads up to 90% of DNN training time and network-connected instances can suffer from up to 5x slowdown compared to training on a single instance. Further, we model the impact of DNN macroscopic features such as the number of layers and the number of gradients on communication stalls. Finally, we propose a measurement-based recommendation model for users to lower their public cloud monetary costs for DDL, given a time budget.

cloud computing, machine learning, stall, (15 more...)

arXiv.org Artificial Intelligence

2208.14344

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.93)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reinforced Approximate Exploratory Data Analysis

Garg, Shaddy, Mitra, Subrata, Yu, Tong, Gadhia, Yash, Kashettiwar, Arjun

arXiv.org Artificial IntelligenceDec-12-2022

Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact. Evaluations with 3 real datasets show that our technique can preserve the original insight generation flow while improving the interaction latency, compared to baseline methods.

data mining, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2212.06225

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Electra: Conditional Generative Model based Predicate-Aware Query Approximation

Sheoran, Nikhil, Mitra, Subrata, Porwal, Vibhor, Ghetia, Siddharth, Varshney, Jatin, Mai, Tung, Rao, Anup, Maddukuri, Vikas

arXiv.org Artificial IntelligenceJan-28-2022

The goal of Approximate Query Processing (AQP) is to provide very fast but "accurate enough" results for costly aggregate queries thereby improving user experience in interactive exploration of large datasets. Recently proposed Machine-Learning based AQP techniques can provide very low latency as query execution only involves model inference as compared to traditional query processing on database clusters. However, with increase in the number of filtering predicates(WHERE clauses), the approximation error significantly increases for these methods. Analysts often use queries with a large number of predicates for insights discovery. Thus, maintaining low approximation error is important to prevent analysts from drawing misleading conclusions. In this paper, we propose ELECTRA, a predicate-aware AQP system that can answer analytics-style queries with a large number of predicates with much smaller approximation errors. ELECTRA uses a conditional generative model that learns the conditional distribution of the data and at runtime generates a small (~1000 rows) but representative sample, on which the query is executed to compute the approximate result. Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v36i8.20800

2201.1242

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.62)

Add feedback

DeepPlace: Learning to Place Applications in Multi-Tenant Clusters

Mitra, Subrata, Mondal, Shanka Subhra, Sheoran, Nikhil, Dhake, Neeraj, Nehra, Ravinder, Simha, Ramanuja

arXiv.org Machine LearningJul-30-2019

Large multi-tenant production clusters often have to handle a variety of jobs and applications with a variety of complex resource usage characteristics. It is non-trivial and non-optimal to manually create placement rules for scheduling that would decide which applications should co-locate. In this paper, we present DeepPlace, a scheduler that learns to exploits various temporal resource usage patterns of applications using Deep Reinforcement Learning (Deep RL) to reduce resource competition across jobs running in the same machine while at the same time optimizing for overall cluster utilization.

application, artificial intelligence, neural network, (17 more...)

arXiv.org Machine Learning

doi: 10.1145/3343737.3343741

1907.12916

Country:

Asia > China (0.16)
North America > United States (0.14)
Asia > India (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback