Kumar, Abhishek
Wanda++: Pruning Large Language Models via Regional Gradients
Yang, Yifan, Zhen, Kai, Ganesh, Bhavana, Galstyan, Aram, Huybrechts, Goeric, Müller, Markus, Kübler, Jonas M., Swaminathan, Rupak Vignesh, Mouchtaris, Athanasios, Bodapati, Sravan Babu, Susanj, Nathan, Zhang, Zheng, FitzGerald, Jack, Kumar, Abhishek
Large Language Models (LLMs) pruning seeks to remove unimportant weights for inference speedup with minimal performance impact. However, existing methods often suffer from performance loss without full-model sparsity-aware fine-tuning. This paper presents Wanda++, a novel pruning framework that outperforms the state-of-the-art methods by utilizing decoder-block-level \textbf{regional} gradients. Specifically, Wanda++ improves the pruning score with regional gradients for the first time and proposes an efficient regional optimization method to minimize pruning-induced output discrepancies between the dense and sparse decoder output. Notably, Wanda++ improves perplexity by up to 32\% over Wanda in the language modeling task and generalizes effectively to downstream tasks. Further experiments indicate our proposed method is orthogonal to sparsity-aware fine-tuning, where Wanda++ can be combined with LoRA fine-tuning to achieve a similar perplexity improvement as the Wanda method. The proposed method is lightweight, pruning a 7B LLaMA model in under 10 minutes on a single NVIDIA H100 GPU.
FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance
Kang, Mintong, Kumar, Vinayshekhar Bannihatti, Roy, Shamik, Kumar, Abhishek, Khosla, Sopan, Narayanaswamy, Balakrishnan Murali, Gangadharaiah, Rashmi
Text-to-image diffusion models often exhibit biases toward specific demographic groups, such as generating more males than females when prompted to generate images of engineers, raising ethical concerns and limiting their adoption. In this paper, we tackle the challenge of mitigating generation bias towards any target attribute value (e.g., "male" for "gender") in diffusion models while preserving generation quality. We propose FairGen, an adaptive latent guidance mechanism which controls the generation distribution during inference. In FairGen, a latent guidance module dynamically adjusts the diffusion process to enforce specific attributes, while a memory module tracks the generation statistics and steers latent guidance to align with the targeted fair distribution of the attribute values. Further, given the limitations of existing datasets in comprehensively assessing bias in diffusion models, we introduce a holistic bias evaluation benchmark HBE, covering diverse domains and incorporating complex prompts across various applications. Extensive evaluations on HBE and Stable Bias datasets demonstrate that FairGen outperforms existing bias mitigation approaches, achieving substantial bias reduction (e.g., 68.5% gender bias reduction on Stable Diffusion 2). Ablation studies highlight FairGen's ability to flexibly and precisely control generation distribution at any user-specified granularity, ensuring adaptive and targeted bias mitigation.
Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions
Chhikara, Garima, Kumar, Abhishek, Chakraborty, Abhijnan
Large Language Models (LLMs) have shown remarkable advancements but also raise concerns about cultural bias, often reflecting dominant narratives at the expense of under-represented subcultures. In this study, we evaluate the capacity of LLMs to recognize and accurately respond to the Little Traditions within Indian society, encompassing localized cultural practices and subcultures such as caste, kinship, marriage, and religion. Through a series of case studies, we assess whether LLMs can balance the interplay between dominant Great Traditions and localized Little Traditions. We explore various prompting strategies and further investigate whether using prompts in regional languages enhances the models cultural sensitivity and response quality. Our findings reveal that while LLMs demonstrate an ability to articulate cultural nuances, they often struggle to apply this understanding in practical, context-specific scenarios. To the best of our knowledge, this is the first study to analyze LLMs engagement with Indian subcultures, offering critical insights into the challenges of embedding cultural diversity in AI systems.
Large Language Model Based Multi-Agent System Augmented Complex Event Processing Pipeline for Internet of Multimedia Things
Zeeshan, Talha, Kumar, Abhishek, Pirttikangas, Susanna, Tarkoma, Sasu
The rapid advancement of artificial intelligence (AI) technologies has revolutionized the way we process and analyze data, particularly in the field of complex event processing, such as video query analysis. Traditional CEP systems often struggle with the dynamic demands of modern applications such as real-time or near realtime video analytics that require the integration of diverse data sources, for example, thousands of surveillance cameras deployed in a city, leading to limitations in their performance and applicability. Modern CEP pipelines are domain-specific and often struggle to adapt to dynamic changes in the environment in a timely manner. State-of-the-art applications (such as live video streaming on TikTok, YouTube etc.) generate an increasing volume of diverse, complex data that needs to be handled in the appropriate manner depending on the use case. Large Language Models (LLMs), also known as foundation models, inherently possess the ability to handle and analyze dynamic forms of data and therefore provide the necessary foundation upon which a dynamic CEP pipeline can be created which can support a diverse range of domains.
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Hsieh, Cheng-Yu, Chuang, Yung-Sung, Li, Chun-Liang, Wang, Zifeng, Le, Long T., Kumar, Abhishek, Glass, James, Ratner, Alexander, Lee, Chen-Yu, Krishna, Ranjay, Pfister, Tomas
Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-themiddle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit an U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless Figure 1: (a) Lost-in-the-middle refers to models' U-of their relevance. Second, we mitigate shape RAG performance as the relevant context's (e.g., this positional bias through a calibration a gold document containing the answer to a query) position mechanism, found-in-the-middle, that allows varies within the input; (b) We observe models the model to attend to contexts faithfully according exhibit U-shape attention weights favoring leading and to their relevance, even though when ending contexts, regardless of their actual contents; (c) they are in the middle. Third, we show foundin-the-middle Models do attend to relevant contexts even when placed not only achieves better performance in the middle, but are eventually distracted by leading/ending in locating relevant information within contexts; (d) We propose a calibration mechanism, a long context, but also eventually leads to improved found-in-the-middle, that disentangles the effect retrieval-augmented generation (RAG) of U-shape attention bias and allows models to attend performance across various tasks, outperforming to relevant context regardless their positions.
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models
Kumar, Abhishek, Morabito, Robert, Umbet, Sanzhar, Kabbara, Jad, Emami, Ali
As the use of Large Language Models (LLMs) becomes more widespread, understanding their self-evaluation of confidence in generated responses becomes increasingly important as it is integral to the reliability of the output of these models. We introduce the concept of Confidence-Probability Alignment, that connects an LLM's internal confidence, quantified by token probabilities, to the confidence conveyed in the model's response when explicitly asked about its certainty. Using various datasets and prompting techniques that encourage model introspection, we probe the alignment between models' internal and expressed confidence. These techniques encompass using structured evaluation scales to rate confidence, including answer options when prompting, and eliciting the model's confidence level for outputs it does not recognize as its own. Notably, among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment, with an average Spearman's $\hat{\rho}$ of 0.42, across a wide range of tasks. Our work contributes to the ongoing efforts to facilitate risk assessment in the application of LLMs and to further our understanding of model trustworthiness.
Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models
Kumar, Abhishek, Yunusov, Sarfaroz, Emami, Ali
Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models' outputs toward particular social narratives. This study addresses two such biases within LLMs: representative bias, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and affinity bias, reflecting the models' evaluative preferences for specific narratives or viewpoints. We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), and present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect these subtle biases. Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men. Furthermore, our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to `bias fingerprints'. This trend is also seen in human evaluators, highlighting a complex interplay between human and machine bias perceptions.
RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control
Rout, Litu, Chen, Yujia, Ruiz, Nataniel, Kumar, Abhishek, Caramanis, Constantine, Shakkottai, Sanjay, Chu, Wen-Sheng
We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets.
Score-based Causal Representation Learning: Linear and General Transformations
Varıcı, Burak, Acartürk, Emre, Shanmugam, Karthikeyan, Kumar, Abhishek, Tajer, Ali
This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the \emph{identifiability} and \emph{achievability} aspects. Identifiability refers to determining algorithm-agnostic conditions that ensure recovering the true latent causal variables and the latent causal graph underlying them. Achievability refers to the algorithmic aspects and addresses designing algorithms that achieve identifiability guarantees. By drawing novel connections between \emph{score functions} (i.e., the gradients of the logarithm of density functions) and CRL, this paper designs a \emph{score-based class of algorithms} that ensures both identifiability and achievability. First, the paper focuses on \emph{linear} transformations and shows that one stochastic hard intervention per node suffices to guarantee identifiability. It also provides partial identifiability guarantees for soft interventions, including identifiability up to ancestors for general causal models and perfect latent graph recovery for sufficiently non-linear causal models. Secondly, it focuses on \emph{general} transformations and shows that two stochastic hard interventions per node suffice for identifiability. Notably, one does \emph{not} need to know which pair of interventional environments have the same node intervened.
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Singh, Avi, Co-Reyes, John D., Agarwal, Rishabh, Anand, Ankesh, Patil, Piyush, Garcia, Xavier, Liu, Peter J., Harrison, James, Lee, Jaehoon, Xu, Kelvin, Parisi, Aaron, Kumar, Abhishek, Alemi, Alex, Rizkowsky, Alex, Nova, Azade, Adlam, Ben, Bohnet, Bernd, Elsayed, Gamaleldin, Sedghi, Hanie, Mordatch, Igor, Simpson, Isabelle, Gur, Izzeddin, Snoek, Jasper, Pennington, Jeffrey, Hron, Jiri, Kenealy, Kathleen, Swersky, Kevin, Mahajan, Kshiteej, Culp, Laura, Xiao, Lechao, Bileschi, Maxwell L., Constant, Noah, Novak, Roman, Liu, Rosanne, Warkentin, Tris, Qian, Yundi, Bansal, Yamini, Dyer, Ethan, Neyshabur, Behnam, Sohl-Dickstein, Jascha, Fiedel, Noah
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.