Not enough data to create a plot.
Try a different view from the menu above.
Mukherjee, Koyel
From Selection to Generation: A Survey of LLM-based Active Learning
Xia, Yu, Mukherjee, Subhojyoti, Xie, Zhouhang, Wu, Junda, Li, Xintong, Aponte, Ryan, Lyu, Hanjia, Barrow, Joe, Chen, Hongjie, Dernoncourt, Franck, Kveton, Branislav, Yu, Tong, Zhang, Ruiyi, Gu, Jiuxiang, Ahmed, Nesreen K., Wang, Yu, Chen, Xiang, Deilamsalehy, Hanieh, Kim, Sungchul, Hu, Zhengmian, Zhao, Yue, Lipka, Nedim, Yoon, Seunghyun, Huang, Ting-Hao Kenneth, Wang, Zichao, Mathur, Puneet, Pal, Soumyabrata, Mukherjee, Koyel, Zhang, Zhehao, Park, Namyong, Nguyen, Thien Huu, Luo, Jiebo, Rossi, Ryan A., McAuley, Julian
Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Jain, Akriti, Sharma, Saransh, Mukherjee, Koyel, Pal, Soumyabrata
Auto-regressive Large Language Models (LLMs) demonstrate remarkable performance across different domains such as vision and language processing. However, due to sequential processing through a stack of transformer layers, autoregressive decoding faces significant computation/latency challenges, particularly in resource-constrained environments like mobile and edge devices. Existing approaches in literature that aim to improve latency via skipping layers have two distinct flavors - 1) Early exit, and 2) Input-agnostic heuristics where tokens exit at pre-determined layers irrespective of input sequence. Both the above strategies have limitations - the former cannot be applied to handle KV Caching necessary for speed-ups in modern framework and the latter does not capture the variation in layer importance across tasks or more generally, across input sequences. To address both limitations, we propose FiRST, an algorithm that reduces inference latency by using layer-specific routers to select a subset of transformer layers adaptively for each input sequence - the prompt (during the prefill stage) decides which layers will be skipped during decoding. FiRST preserves compatibility with KV caching enabling faster inference while being quality-aware. FiRST is model-agnostic and can be easily enabled on any pre-trained LLM. Our approach reveals that input adaptivity is critical - indeed, different task-specific middle layers play a crucial role in evolving hidden representations depending on tasks. Extensive experiments show that FiRST significantly reduces latency while outperforming other layer selection strategies in quality metics. It retains competitive performance to base model (without layer skipping) and in some cases, even improves upon it. FiRST is thus a promising and efficient solution for LLM deployment in low-resource environments.
PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks
Ghosal, Soumya Suvra, Pal, Soumyabrata, Mukherjee, Koyel, Manocha, Dinesh
Large Language Models (LLMs) have recently demonstrated impressive few-shot learning capabilities through in-context learning (ICL). However, ICL performance is highly dependent on the choice of few-shot demonstrations, making the selection of the most optimal examples a persistent research challenge. This issue is further amplified in low-resource Indic languages, where the scarcity of ground-truth data complicates the selection process. In this work, we propose PromptRefine, a novel Alternating Minimization approach for example selection that improves ICL performance on low-resource Indic languages. PromptRefine leverages auxiliary example banks from related high-resource Indic languages and employs multi-task learning techniques to align language-specific retrievers, enabling effective cross-language retrieval. Additionally, we incorporate diversity in the selected examples to enhance generalization and reduce bias. Through comprehensive evaluations on four text generation tasks -- Cross-Lingual Question Answering, Multilingual Question Answering, Machine Translation, and Cross-Lingual Summarization using state-of-the-art LLMs such as LLAMA-3.1-8B, LLAMA-2-7B, Qwen-2-7B, and Qwen-2.5-7B, we demonstrate that PromptRefine significantly outperforms existing frameworks for retrieving examples.
Towards Optimizing the Costs of LLM Usage
Shekhar, Shivanshu, Dubey, Tanishq, Mukherjee, Koyel, Saxena, Apoorv, Tyagi, Atharv, Kotla, Nishanth
Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases. In this work, we propose optimizing the usage costs of LLMs by estimating their output quality (without actually invoking the LLMs), and then solving an optimization routine for the LLM selection to either keep costs under a budget, or minimize the costs, in a quality and latency aware manner. We propose a model to predict the output quality of LLMs on document processing tasks like summarization, followed by an LP rounding algorithm to optimize the selection of LLMs. We study optimization problems trading off the quality and costs, both theoretically and empirically. We further propose a sentence simplification model for reducing the number of tokens in a controlled manner. Additionally, we propose several deterministic heuristics for reducing tokens in a quality aware manner, and study the related optimization problem of applying the heuristics optimizing the quality and cost trade-off. We perform extensive empirical validation of our methods on not only enterprise datasets but also on open-source datasets, annotated by us, and show that we perform much better compared to closest baselines. Our methods reduce costs by 40%- 90% while improving quality by 4%-7%. We will release the annotated open source datasets to the community for further research and exploration.
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Mukherjee, Koyel, Khare, Alind, Verma, Ashish
Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know the best learning rate regime beforehand. We propose an automated algorithm for determining the learning rate trajectory, that works across datasets and models for both natural and adversarial training, without requiring any dataset/model specific tuning. It is a stand-alone, parameterless, adaptive approach with no computational overhead. We theoretically discuss the algorithm's convergence behavior. We empirically validate our algorithm extensively. Our results show that our proposed approach \emph{consistently} achieves top-level accuracy compared to SOTA baselines in the literature in natural as well as adversarial training.
Layer Dynamics of Linearised Neural Nets
Basu, Saurav, Mukherjee, Koyel, Vasudevan, Shrihari
Despite the phenomenal success of deep learning in recent years, there remains a gap in understanding the fundamental mechanics of neural nets. More research is focussed on handcrafting complex and larger networks, and the design decisions are often ad-hoc and based on intuition. Some recent research has aimed to demystify the learning dynamics in neural nets by attempting to build a theory from first principles, such as characterising the non-linear dynamics of specialised \textit{linear} deep neural nets (such as orthogonal networks). In this work, we expand and derive properties of learning dynamics respected by general multi-layer linear neural nets. Although an over-parameterisation of a single layer linear network, linear multi-layer neural nets offer interesting insights that explain how learning dynamics proceed in small pockets of the data space. We show in particular that multiple layers in linear nets grow at approximately the same rate, and there are distinct phases of learning with markedly different layer growth. We then apply a linearisation process to a general RelU neural net and show how nonlinearity breaks down the growth symmetry observed in liner neural nets. Overall, our work can be viewed as an initial step in building a theory for understanding the effect of layer design on the learning dynamics from first principles.
PISCES: Participatory Incentive Strategies for Effective Community Engagement in Smart Cities
Biswas, Arpita (Xerox Research Centre India) | Chander, Deepthi (Xerox Research Centre India) | Dasgupta, Koustuv (Xerox Research Centre India) | Mukherjee, Koyel (Xerox Research Centre India) | Singh, Mridula (Xerox Research Centre India) | Mukherjee, Tridib (Xerox Research Centre India)
A key challenge in participatory sensing systems has been the design of incentive mechanisms that motivate individuals to contribute data to consuming applications. Emerging trends in urban development and smart city planning indicate the use of citizen reports to gather insights and identify areas for transformation. Consumers of these reports (e.g. city agencies) typically associate non-uniform utility (or values) to different reports based on the spatio-temporal context of the reports. For example, a report indicating traffic congestion near an airport, in early morning hours, would tend to have much higher utility than a similar report from a sparse residential area. In such cases, the design of an incentive mechanism must motivate participants, via appropriate rewards (or payments), to provide higher utility reports when compared to less valued ones. The main challenge in designing such an incentive scheme is two-fold: (i) lack of prior knowledge of participants in terms of their availability (i.e. who are in the vicinity) and reporting behaviour (i.e. what are the rewards expected); and (ii) minimizing payments to the reporters while ensuring that the desired number of reports are collected. In this paper, we propose STOC-PISCES, an algorithm that guarantees a stochastic optimal solution in the generalized setting of an unknown set of participants, with non-deterministic availabilities and stochastically rational reporting behaviour. The superior performance of STOC-PISCES in experimental settings, based on real-world data, endorses its adoption as an incentive strategy in participatory sensing applications like smart city management.