AITopics | Mahajan, Divyat

Collaborating Authors

Mahajan, Divyat

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning to Defer for Causal Discovery with Imperfect Experts

Clivio, Oscar, Mahajan, Divyat, Taslakian, Perouz, Magliacane, Sara, Mitliagkas, Ioannis, Zantedeschi, Valentina, Drouin, Alexandre

arXiv.org Machine LearningFeb-18-2025

Integrating expert knowledge, e.g. from large language models, into causal discovery algorithms can be challenging when the knowledge is not guaranteed to be correct. Expert recommendations may contradict data-driven results, and their reliability can vary significantly depending on the domain or specific query. Existing methods based on soft constraints or inconsistencies in predicted causal relationships fail to account for these variations in expertise. To remedy this, we propose L2D-CD, a method for gauging the correctness of expert recommendations and optimally combining them with data-driven causal discovery results. By adapting learning-to-defer (L2D) algorithms for pairwise causal discovery (CD), we learn a deferral function that selects whether to rely on classical causal discovery methods using numerical data or expert recommendations based on textual meta-data. We evaluate L2D-CD on the canonical T\"ubingen pairs dataset and demonstrate its superior performance compared to both the causal discovery method and the expert used in isolation. Moreover, our approach identifies domains where the expert's performance is strong or weak. Finally, we outline a strategy for generalizing this approach to causal discovery on graphs with more than two variables, paving the way for further research in this area.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2502.13132

Country:

Europe (0.47)
Oceania > Australia (0.46)
North America > Canada (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

Compositional Risk Minimization

Mahajan, Divyat, Pezeshki, Mohammad, Mitliagkas, Ioannis, Ahuja, Kartik, Vincent, Pascal

arXiv.org Artificial IntelligenceOct-8-2024

In this work, we tackle a challenging and extreme form of subpopulation shift, which is termed compositional shift. Under compositional shifts, some combinations of attributes are totally absent from the training distribution but present in the test distribution. We model the data with flexible additive energy distributions, where each energy term represents an attribute, and derive a simple alternative to empirical risk minimization termed compositional risk minimization (CRM). We first train an additive energy classifier to predict the multiple attributes and then adjust this classifier to tackle compositional shifts. We provide an extensive theoretical analysis of CRM, where we show that our proposal extrapolates to special affine hulls of seen attribute combinations. Empirical evaluations on benchmark datasets confirms the improved robustness of CRM compared to other methods from the literature designed to tackle various forms of subpopulation shifts.

affine hull, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.06303

Country: North America > Canada (0.14)

Genre:

Research Report (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Zero-Shot Learning of Causal Models

Mahajan, Divyat, Gladrow, Jannes, Hilmkil, Agrin, Zhang, Cheng, Scetbon, Meyer

arXiv.org Machine LearningOct-8-2024

With the increasing acquisition of datasets over time, we now have access to precise and varied descriptions of the world, capturing all sorts of phenomena. These datasets can be seen as empirical observations of unknown causal generative processes, which can commonly be described by Structural Causal Models (SCMs). Recovering these causal generative processes from observations poses formidable challenges, and often require to learn a specific generative model for each dataset. In this work, we propose to learn a \emph{single} model capable of inferring in a zero-shot manner the causal generative processes of datasets. Rather than learning a specific SCM for each dataset, we enable the Fixed-Point Approach (FiP) proposed in~\cite{scetbon2024fip}, to infer the generative SCMs conditionally on their empirical representations. More specifically, we propose to amortize the learning of a conditional version of FiP to infer generative SCMs from observations and causal structures on synthetically generated datasets. We show that our model is capable of predicting in zero-shot the true generative SCMs, and as a by-product, of (i) generating new dataset samples, and (ii) inferring intervened ones. Our experiments demonstrate that our amortized procedure achieves performances on par with SoTA methods trained specifically for each dataset on both in and out-of-distribution problems. To the best of our knowledge, this is the first time that SCMs are inferred in a zero-shot manner from observations, paving the way for a paradigmatic shift towards the assimilation of causal knowledge across datasets.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2410.06128

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Evaluating Interventional Reasoning Capabilities of Large Language Models

Kasetty, Tejas, Mahajan, Divyat, Dziugaite, Gintare Karolina, Drouin, Alexandre, Sridhar, Dhanya

arXiv.org Artificial IntelligenceApr-8-2024

Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consider using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. Our analysis on four LLMs highlights that while GPT- 4 models show promising accuracy at predicting the intervention effects, they remain sensitive to distracting factors in the prompts.

intervention, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2404.05545

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Law (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

Lachapelle, Sébastien, Mahajan, Divyat, Mitliagkas, Ioannis, Lacoste-Julien, Simon

arXiv.org Machine LearningNov-2-2023

We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.

artificial intelligence, decoder, machine learning, (16 more...)

arXiv.org Machine Learning

2307.02598

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation

Mahajan, Divyat, Mitliagkas, Ioannis, Neal, Brady, Syrgkanis, Vasilis

arXiv.org Artificial IntelligenceJun-12-2023

We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estimation under binary treatments. Unlike model selection in machine learning, there is no perfect analogue of cross-validation as we do not observe the counterfactual potential outcome for any data point. Towards this, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models estimated from the observed data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can access the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics introduced in the literature, and novel ones introduced in this work, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. Our analysis suggests novel model selection strategies based on careful hyperparameter tuning of CATE estimators and causal ensembling.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.01939

Country: North America (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning

Lachapelle, Sébastien, Deleu, Tristan, Mahajan, Divyat, Mitliagkas, Ioannis, Bengio, Yoshua, Lacoste-Julien, Simon, Bertrand, Quentin

arXiv.org Artificial IntelligenceJun-6-2023

Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM base-predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2211.14666

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Interventional Causal Representation Learning

Ahuja, Kartik, Mahajan, Divyat, Wang, Yixin, Bengio, Yoshua

arXiv.org Artificial IntelligenceFeb-10-2023

Causal representation learning seeks to extract high-level latent factors from low-level sensory data. Most existing methods rely on observational data and structural assumptions (e.g., conditional independence) to identify the latent factors. However, interventional data is prevalent across applications. Can interventional data facilitate causal representation learning? We explore this question in this paper. The key observation is that interventional data often carries geometric signatures of the latent factors' support (i.e. what values each latent can possibly take). For example, when the latent factors are causally connected, interventions can break the dependency between the intervened latents' support and their ancestors'. Leveraging this fact, we prove that the latent causal factors can be identified up to permutation and scaling given data from perfect $do$ interventions. Moreover, we can achieve block affine identification, namely the estimated latent factors are only entangled with a few other latents if we have access to data from imperfect interventions. These results highlight the unique power of interventional data in causal representation learning; they can enable provable identification of latent factors without any assumptions about their distributions or dependency structure.

artificial intelligence, intervention, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2209.11924

Country: North America (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Connection between Out-of-Distribution Generalization and Privacy of ML Models

Mahajan, Divyat, Tople, Shruti, Sharma, Amit

arXiv.org Artificial IntelligenceOct-7-2021

With the goal of generalizing to out-of-distribution (OOD) data, recent domain generalization methods aim to learn "stable" feature representations whose effect on the output remains invariant across domains. Given the theoretical connection between generalization and privacy, we ask whether better OOD generalization leads to better privacy for machine learning models, where privacy is measured through robustness to membership inference (MI) attacks. In general, we find that the relationship does not hold. Through extensive evaluation on a synthetic dataset and image datasets like MNIST, Fashion-MNIST, and Chest X-rays, we show that a lower OOD generalization gap does not imply better robustness to MI attacks. Instead, privacy benefits are based on the extent to which a model captures the stable features. A model that captures stable features is more robust to MI attacks than models that exhibit better OOD generalization but do not learn stable features. Further, for the same provable differential privacy guarantees, a model that learns stable features provides higher utility as compared to others. Our results offer the first extensive empirical study connecting stable features and privacy, and also have a takeaway for the domain generalization community; MI attack can be used as a complementary metric to measure model quality.

diagnostic medicine, machine learning, teaching method, (23 more...)

arXiv.org Artificial Intelligence

2110.03369

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Domain Generalization using Causal Matching

Mahajan, Divyat, Tople, Shruti, Sharma, Amit

arXiv.org Artificial IntelligenceJun-12-2020

Learning invariant representations has been proposed as a key technique for addressing the domain generalization problem. However, the question of identifying the right conditions for invariance remains unanswered. In this work, we propose a causal interpretation of domain generalization that defines domains as interventions under a data-generating process. Based on a general causal model for data from multiple domains, we show that prior methods for learning an invariant representation optimize for an incorrect objective. We highlight an alternative condition: inputs across domains should have the same representation if they are derived from the same base object. In practice, knowledge about generation of data or objects is not available. Hence we propose an iterative algorithm called MatchDG that approximates base object similarity by using a contrastive loss formulation adapted for multiple domains. We then match inputs that are similar under the resultant representation to build an invariant classifier. We evaluate MatchDG on rotated MNIST, Fashion-MNIST, and PACS datasets and find that it outperforms prior work on out-of-domain accuracy and learns matches that have over 25\% overlap with ground-truth object matches in MNIST and Fashion-MNIST. Code repository can be accessed here: \textit{https://github.com/microsoft/robustdg}

artificial intelligence, evolutionary algorithm, representation, (19 more...)

arXiv.org Artificial Intelligence

2006.075

Country: Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback