AITopics | Budhathoki, Kailash

Collaborating Authors

Budhathoki, Kailash

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Proximal Operator for Inducing 2:4-Sparsity

Kübler, Jonas M, Wang, Yu-Xiang, Sabach, Shoham, Ansari, Navid, Kleindessner, Matthäus, Budhathoki, Kailash, Cevher, Volkan, Karypis, George

arXiv.org Artificial IntelligenceJan-29-2025

Recent hardware advancements in AI Accelerators and GPUs allow to efficiently compute sparse matrix multiplications, especially when 2 out of 4 consecutive weights are set to zero. However, this so-called 2:4 sparsity usually comes at a decreased accuracy of the model. We derive a regularizer that exploits the local correlation of features to find better sparsity masks in trained models. We minimize the regularizer jointly with a local squared loss by deriving the proximal operator for which we show that it has an efficient solution in the 2:4-sparse case. After optimizing the mask, we use maskedgradient updates to further minimize the local squared loss. We illustrate our method on toy problems and apply it to pruning entire large language models up to 70B parameters. On models up to 13B we improve over previous state of the art algorithms, whilst on 70B models we match their performance.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.18015

Country:

Europe (0.93)
Asia > Middle East > Israel (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Add feedback

LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models

Hoffmann, David, Budhathoki, Kailash, Kleindessner, Matthaeus

arXiv.org Artificial IntelligenceNov-29-2024

The evolving capabilities of large language models are accompanied by growing sizes and deployment costs, necessitating effective inference optimisation techniques. We propose a novel pruning method utilising centrality measures from graph theory, reducing both the computational requirements and the memory footprint of these models. Specifically, we devise a method for creating a weighted directed acyclical graph representation of multilayer perceptrons to which we apply a modified version of the weighted PageRank centrality measure to compute node importance scores. In combination with uniform pruning this leads to structured sparsity. We call this pruning method MLPRank. Furthermore we introduce an extension to decoder-only transformer models and call it LLMRank. For both variants we demonstrate a strong performance. With MLPRank on average leading to 6.09 % higher accuracy retention than three popular baselines and 13.42 % with LLMRank compared to two popular baselines. Code is available at https://github.com/amazon-science/llm-rank-pruning.

large language model, machine learning, pruning, (18 more...)

arXiv.org Artificial Intelligence

2410.13299

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Add feedback

Inference Optimization of Foundation Models on AI Accelerators

Park, Youngsuk, Budhathoki, Kailash, Chen, Liangfu, Kübler, Jonas, Huang, Jiaji, Kleindessner, Matthäus, Huan, Jun, Cevher, Volkan, Wang, Yida, Karypis, George

arXiv.org Artificial IntelligenceJul-12-2024

Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.09111

Country:

Europe > Spain (0.16)
North America > United States (0.14)
Europe > Germany (0.14)
Europe > Switzerland (0.14)

Genre:

Overview (0.68)
Research Report (0.50)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

Ali, Junaid, Kleindessner, Matthaeus, Wenzel, Florian, Budhathoki, Kailash, Cevher, Volkan, Russell, Chris

arXiv.org Artificial IntelligenceOct-18-2023

We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning. We categorize desired behaviors based around three axes: (i) if the task concerns humans; (ii) how subjective the task is (i.e., how likely it is that people from a diverse range of backgrounds would agree on a labeling); and (iii) the intended purpose of the task and if fairness is better served by impartiality (i.e., making decisions independent of the protected attributes) or representation (i.e., making decisions to maximize diversity). Finally, we provide quantitative fairness evaluations for both binary-valued and multi-valued protected attributes over ten diverse datasets. We find that fair PCA, a post-processing method for fair representations, works very well for debiasing in most of the aforementioned tasks while incurring only minor loss of performance. However, different debiasing approaches vary in their effectiveness depending on the task. Hence, one should choose the debiasing approach depending on the specific use case.

discriminative foundation model, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3600211.3604720

2310.11867

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models

Blöbaum, Patrick, Götz, Peter, Budhathoki, Kailash, Mastakouri, Atalanti A., Janzing, Dominik

arXiv.org Machine LearningJun-14-2022

We introduce DoWhy-GCM, an extension of the DoWhy Python library, that leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation questions, with DoWhy-GCM, users can ask a wide range of additional causal questions, such as identifying the root causes of outliers and distributional changes, causal structure learning, attributing causal influences, and diagnosis of causal structures. To this end, DoWhy-GCM users first model cause-effect relations between variables in a system under study through a graphical causal model, fit the causal mechanisms of variables next, and then ask the causal question. All these steps take only a few lines of code in DoWhy-GCM. The library is available at https://github.com/py-why/dowhy.

artificial intelligence, causal inference, graphical causal model, (2 more...)

arXiv.org Machine Learning

2206.06821

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.80)

Add feedback

Why did the distribution change?

Budhathoki, Kailash, Janzing, Dominik, Bloebaum, Patrick, Ng, Hoiyi

arXiv.org Artificial IntelligenceFeb-26-2021

We describe a formal approach based on graphical causal models to identify the "root causes" of the change in the probability distribution of variables. After factorizing the joint distribution into conditional distributions of each variable, given its parents (the "causal mechanisms"), we attribute the change to changes of these causal mechanisms. This attribution analysis accounts for the fact that mechanisms often change independently and sometimes only some of them change. Through simulations, we study the performance of our distribution change attribution method. We then present a real-world case study identifying the drivers of the difference in the income distribution between men and women.

artificial intelligence, causal mechanism, health & medicine, (19 more...)

arXiv.org Artificial Intelligence

2102.13384

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Discovering Reliable Causal Rules

Budhathoki, Kailash, Boley, Mario, Vreeken, Jilles

arXiv.org Artificial IntelligenceSep-8-2020

We study the problem of deriving policies, or rules, that when enacted on a complex system, cause a desired outcome. Absent the ability to perform controlled experiments, such rules have to be inferred from past observations of the system's behaviour. This is a challenging problem for two reasons: First, observational effects are often unrepresentative of the underlying causal effect because they are skewed by the presence of confounding factors. Second, naive empirical estimations of a rule's effect have a high variance, and, hence, their maximisation can lead to random results. To address these issues, first we measure the causal effect of a rule from observational data---adjusting for the effect of potential confounders. Importantly, we provide a graphical criteria under which causal rule discovery is possible. Moreover, to discover reliable causal rules from a sample, we propose a conservative and consistent estimator of the causal effect, and derive an efficient and exact algorithm that maximises the estimator. On synthetic data, the proposed estimator converges faster to the ground truth than the naive estimator and recovers relevant causal rules even at small sample sizes. Extensive experiments on a variety of real-world datasets show that the proposed algorithm is efficient and discovers meaningful rules.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2009.02728

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.86)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Association Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.48)

Add feedback

Causal structure based root cause analysis of outliers

Janzing, Dominik, Budhathoki, Kailash, Minorics, Lenon, Blöbaum, Patrick

arXiv.org Machine LearningDec-5-2019

We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable is unexpected *given the value of its parents* in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory.

artificial intelligence, game theory, outlier score, (19 more...)

arXiv.org Machine Learning

1912.02724

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Game Theory (0.88)
Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback