Kiyavash, Negar
s-ID: Causal Effect Identification in a Sub-Population
Abouei, Amir Mohammad, Mokhtarian, Ehsan, Kiyavash, Negar
Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup, which is distinguished from the whole population through the influence of systematic biases in the sampling process. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.
Modeling Systemic Risk: A Time-Varying Nonparametric Causal Inference Framework
Etesami, Jalal, Habibnia, Ali, Kiyavash, Negar
We propose a nonparametric and time-varying directed information graph (TV-DIG) framework to estimate the evolving causal structure in time series networks, thereby addressing the limitations of traditional econometric models in capturing high-dimensional, nonlinear, and time-varying interconnections among series. This framework employs an information-theoretic measure rooted in a generalized version of Granger-causality, which is applicable to both linear and nonlinear dynamics. Our framework offers advancements in measuring systemic risk and establishes meaningful connections with established econometric models, including vector autoregression and switching models. We evaluate the efficacy of our proposed model through simulation experiments and empirical analysis, reporting promising results in recovering simulated time-varying networks with nonlinear and multivariate structures. We apply this framework to identify and monitor the evolution of interconnectedness and systemic risk among major assets and industrial sectors within the financial network. We focus on cryptocurrencies' potential systemic risks to financial stability, including spillover effects on other sectors during crises like the COVID-19 pandemic and the Federal Reserve's 2020 emergency response. Our findings reveals significant, previously underrecognized pre-2020 influences of cryptocurrencies on certain financial sectors, highlighting their potential systemic risks and offering a systematic approach in tracking evolving cross-sector interactions within financial networks.
A Unified Experiment Design Approach for Cyclic and Acyclic Causal Models
Mokhtarian, Ehsan, Salehkaleybar, Saber, Ghassami, AmirEmad, Kiyavash, Negar
We study experiment design for unique identification of the causal graph of a simple SCM, where the graph may contain cycles. The presence of cycles in the structure introduces major challenges for experiment design as, unlike acyclic graphs, learning the skeleton of causal graphs with cycles may not be possible from merely the observational distribution. Furthermore, intervening on a variable in such graphs does not necessarily lead to orienting all the edges incident to it. In this paper, we propose an experiment design approach that can learn both cyclic and acyclic graphs and hence, unifies the task of experiment design for both types of graphs. We provide a lower bound on the number of experiments required to guarantee the unique identification of the causal graph in the worst case, showing that the proposed approach is order-optimal in terms of the number of experiments up to an additive logarithmic term. Moreover, we extend our result to the setting where the size of each experiment is bounded by a constant. For this case, we show that our approach is optimal in terms of the size of the largest experiment required for uniquely identifying the causal graph in the worst case.
Learning Unknown Intervention Targets in Structural Causal Models from Heterogeneous Data
Yang, Yuqin, Salehkaleybar, Saber, Kiyavash, Negar
We study the problem of identifying the unknown intervention targets in structural causal models where we have access to heterogeneous data collected from multiple environments. The unknown intervention targets are the set of endogenous variables whose corresponding exogenous noises change across the environments. We propose a two-phase approach which in the first phase recovers the exogenous noises corresponding to unknown intervention targets whose distributions have changed across environments. In the second phase, the recovered noises are matched with the corresponding endogenous variables. For the recovery phase, we provide sufficient conditions for learning these exogenous noises up to some component-wise invertible transformation. For the matching phase, under the causal sufficiency assumption, we show that the proposed method uniquely identifies the intervention targets. In the presence of latent confounders, the intervention targets among the observed variables cannot be determined uniquely. We provide a candidate intervention target set which is a superset of the true intervention targets. Our approach improves upon the state of the art as the returned candidate set is always a subset of the target set returned by previous work. Moreover, we do not require restrictive assumptions such as linearity of the causal model or performing invariance tests to learn whether a distribution is changing across environments which could be highly sample inefficient. Our experimental results show the effectiveness of our proposed algorithm in practice.
Momentum-Based Policy Gradient with Second-Order Information
Salehkaleybar, Saber, Khorasani, Sadegh, Kiyavash, Negar, He, Niao, Thiran, Patrick
Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Moreover, the variance of estimation error decays with the fast rate of $O(1/t^{2/3})$ where $t$ is the number of iterations. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.
Efficiently Escaping Saddle Points for Non-Convex Policy Optimization
Khorasani, Sadegh, Salehkaleybar, Saber, Kiyavash, Negar, He, Niao, Grossglauser, Matthias
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate first-order stationary point (FOSP) with the sample complexity of $O(\epsilon^{-3})$. However, FOSPs could be bad local optima or saddle points. Moreover, these algorithms often use importance sampling (IS) weights which could impair the statistical effectiveness of variance reduction. In this paper, we propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $\tilde{O}(\epsilon^{-3})$. This rate improves the best-known sample complexity for achieving approximate SOSPs by a factor of $O(\epsilon^{-0.5})$. Moreover, the proposed variance reduction technique bypasses IS weights by using HVP terms. Our experimental results show that the proposed algorithm outperforms the state of the art and is more robust to changes in random seeds.
Causal Effect Identification in Uncertain Causal Networks
Akbari, Sina, Jamshidi, Fateme, Mokhtarian, Ehsan, Vowels, Matthew J., Etesami, Jalal, Kiyavash, Negar
Causal identification is at the core of the causal inference literature, where complete algorithms have been proposed to identify causal queries of interest. The validity of these algorithms hinges on the restrictive assumption of having access to a correctly specified causal structure. In this work, we study the setting where a probabilistic model of the causal structure is available. Specifically, the edges in a causal graph exist with uncertainties which may, for example, represent degree of belief from domain experts. Alternatively, the uncertainty about an edge may reflect the confidence of a particular statistical test. The question that naturally arises in this setting is: Given such a probabilistic graph and a specific causal effect of interest, what is the subgraph which has the highest plausibility and for which the causal effect is identifiable? We show that answering this question reduces to solving an NP-complete combinatorial optimization problem which we call the edge ID problem. We propose efficient algorithms to approximate this problem and evaluate them against both real-world networks and randomly generated graphs.
On sample complexity of conditional independence testing with Von Mises estimator with application to causal discovery
Jamshidi, Fateme, Ganassali, Luca, Kiyavash, Negar
Motivated by conditional independence testing, an essential step in constraint-based causal discovery algorithms, we study the nonparametric Von Mises estimator for the entropy of multivariate distributions built on a kernel density estimator. We establish an exponential concentration inequality for this estimator. We design a test for conditional independence (CI) based on our estimator, called VM-CI, which achieves optimal parametric rates under smoothness assumptions. Leveraging the exponential concentration, we prove a tight upper bound for the overall error of VM-CI. This, in turn, allows us to characterize the sample complexity of any constraint-based causal discovery algorithm that uses VM-CI for CI tests. To the best of our knowledge, this is the first sample complexity guarantee for causal discovery for continuous variables. Furthermore, we empirically show that VM-CI outperforms other popular CI tests in terms of either time or sample complexity (or both), which translates to a better performance in structure learning as well.
Experimental Design for Causal Effect Identification
Akbari, Sina, Etesami, Jalal, Kiyavash, Negar
Pearl's do calculus is a complete axiomatic approach to learn the identifiable causal effects from observational data. When such an effect is not identifiable, it is necessary to perform a collection of often costly interventions in the system to learn the causal effect. In this work, we consider the problem of designing the collection of interventions with the minimum cost to identify the desired effect. First, we prove that this problem is NP-hard, and subsequently propose an algorithm that can either find the optimal solution or a logarithmic-factor approximation of it. This is done by establishing a connection between our problem and the minimum hitting set problem. Additionally, we propose several polynomial-time heuristic algorithms to tackle the computational complexity of the problem. Although these algorithms could potentially stumble on sub-optimal solutions, our simulations show that they achieve small regrets on random graphs.
Gaussian Database Alignment and Gaussian Planted Matching
Dai, Osman Emre, Cullina, Daniel, Kiyavash, Negar
Database alignment is a variant of the graph alignment problem: Given a pair of anonymized databases containing separate yet correlated features for a set of users, the problem is to identify the correspondence between the features and align the anonymized user sets based on correlation alone. This closely relates to planted matching, where given a bigraph with random weights, the goal is to identify the underlying matching that generated the given weights. We study an instance of the database alignment problem with multivariate Gaussian features and derive results that apply both for database alignment and for planted matching, demonstrating the connection between them. The performance thresholds for database alignment converge to that for planted matching when the dimensionality of the database features is \(\omega(\log n)\), where \(n\) is the size of the alignment, and no individual feature is too strong. The maximum likelihood algorithms for both planted matching and database alignment take the form of a linear program and we study relaxations to better understand the significance of various constraints under various conditions and present achievability and converse bounds. Our results show that the almost-exact alignment threshold for the relaxed algorithms coincide with that of maximum likelihood, while there is a gap between the exact alignment thresholds. Our analysis and results extend to the unbalanced case where one user set is not fully covered by the alignment.