Goto

Collaborating Authors

 Kiyavash, Negar


Nonparametric Hawkes Processes: Online Estimation and Generalization Bounds

arXiv.org Machine Learning

In this paper, we design a nonparametric online algorithm for estimating the triggering functions of multivariate Hawkes processes. Unlike parametric estimation, where evolutionary dynamics can be exploited for fast computation of the gradient, and unlike typical function learning, where representer theorem is readily applicable upon proper regularization of the objective function, nonparametric estimation faces the challenges of (i) inefficient evaluation of the gradient, (ii) lack of representer theorem, and (iii) computationally expensive projection necessary to guarantee positivity of the triggering functions. In this paper, we offer solutions to the above challenges, and design an online estimation algorithm named NPOLE-MHP that outputs estimations with a $\mathcal{O}(1/T)$ regret, and a $\mathcal{O}(1/T)$ stability. Furthermore, we design an algorithm, NPOLE-MMHP, for estimation of multivariate marked Hawkes processes. We test the performance of NPOLE-MHP on various synthetic and real datasets, and demonstrate, under different evaluation metrics, that NPOLE-MHP performs as good as the optimal maximum likelihood estimation (MLE), while having a run time as little as parametric online algorithms.


Fairness in Supervised Learning: An Information Theoretic Approach

arXiv.org Machine Learning

Automated decision making systems are increasingly being used in real-world applications. In these systems for the most part, the decision rules are derived by minimizing the training error on the available historical data. Therefore, if there is a bias related to a sensitive attribute such as gender, race, religion, etc. in the data, say, due to cultural/historical discriminatory practices against a certain demographic, the system could continue discrimination in decisions by including the said bias in its decision rule. We present an information theoretic framework for designing fair predictors from data, which aim to prevent discrimination against a specified sensitive attribute in a supervised learning setting. We use equalized odds as the criterion for discrimination, which demands that the prediction should be independent of the protected attribute conditioned on the actual label. To ensure fairness and generalization simultaneously, we compress the data to an auxiliary variable, which is used for the prediction task. This auxiliary variable is chosen such that it is decontaminated from the discriminatory attribute in the sense of equalized odds. The final predictor is obtained by applying a Bayesian decision rule to the auxiliary variable.



Learning Causal Structures Using Regression Invariance

Neural Information Processing Systems

We study causal discovery in a multi-environment setting, in which the functional relations for producing the variables from their direct causes remain the same across environments, while the distribution of exogenous noises may vary. We introduce the idea of using the invariance of the functional relations of the variables to their causes across a set of environments for structure learning. We define a notion of completeness for a causal inference algorithm in this setting and prove the existence of such algorithm by proposing the baseline algorithm. Additionally, we present an alternate algorithm that has significantly improved computational and sample complexity compared to the baseline algorithm. Experiment results show that the proposed algorithm outperforms the other existing algorithms.


Budgeted Experiment Design for Causal Structure Learning

arXiv.org Machine Learning

We study the problem of causal structure learning when the experimenter is limited to perform at most $k$ non-adaptive experiments of size $1$. We formulate the problem of finding the best intervention target set as an optimization problem, which aims to maximize the average number of edges whose directions are resolved. We prove that the objective function is submodular and a greedy algorithm is a $(1-\frac{1}{e})$-approximation algorithm for the problem. We further present an accelerated variant of the greedy algorithm, which can lead to orders of magnitude performance speedup. We validate our proposed approach on synthetic and real graphs. The results show that compared to the purely observational setting, our algorithm orients majority of the edges through only a small number of interventions.


A New Measure of Conditional Dependence

arXiv.org Machine Learning

Measuring conditional dependencies among the variables of a network is of great interest to many disciplines. This paper studies some shortcomings of the existing dependency measures in detecting direct causal influences or their lack of ability for group selection to capture strong dependencies and accordingly introduces a new statistical dependency measure to overcome them. This measure is inspired by Dobrushin's coefficients and based on the fact that there is no dependency between $X$ and $Y$ given another variable $Z$, if and only if the conditional distribution of $Y$ given $X=x$ and $Z=z$ does not change when $X$ takes another realization $x'$ while $Z$ takes the same realization $z$. We show the advantages of this measure over the related measures in the literature. Moreover, we establish the connection between our measure and the integral probability metric (IPM) that helps to develop estimators of the measure with lower complexity compared to other relevant information theoretic based measures. Finally, we show the performance of this measure through numerical simulations.


Learning Causal Structures Using Regression Invariance

arXiv.org Machine Learning

We study causal inference in a multi-environment setting, in which the functional relations for producing the variables from their direct causes remain the same across environments, while the distribution of exogenous noises may vary. We introduce the idea of using the invariance of the functional relations of the variables to their causes across a set of environments. We define a notion of completeness for a causal inference algorithm in this setting and prove the existence of such algorithm by proposing the baseline algorithm. Additionally, we present an alternate algorithm that has significantly improved computational and sample complexity compared to the baseline algorithm. The experiment results show that the proposed algorithm outperforms the other existing algorithms.


Optimal Experiment Design for Causal Discovery from Fixed Number of Experiments

arXiv.org Machine Learning

We study the problem of causal structure learning over a set of random variables when the experimenter is allowed to perform at most $M$ experiments in a non-adaptive manner. We consider the optimal learning strategy in terms of minimizing the portions of the structure that remains unknown given the limited number of experiments in both Bayesian and minimax setting. We characterize the theoretical optimal solution and propose an algorithm, which designs the experiments efficiently in terms of time complexity. We show that for bounded degree graphs, in the minimax case and in the Bayesian case with uniform priors, our proposed algorithm is a $\rho$-approximation algorithm, where $\rho$ is independent of the order of the underlying graph. Simulations on both synthetic and real data show that the performance of our algorithm is very close to the optimal solution.


Learning Network of Multivariate Hawkes Processes: A Time Series Approach

arXiv.org Machine Learning

Learning the influence structure of multiple time series data is of great interest to many disciplines. This paper studies the problem of recovering the causal structure in network of multivariate linear Hawkes processes. In such processes, the occurrence of an event in one process affects the probability of occurrence of new events in some other processes. Thus, a natural notion of causality exists between such processes captured by the support of the excitation matrix. We show that the resulting causal influence network is equivalent to the Directed Information graph (DIG) of the processes, which encodes the causal factorization of the joint distribution of the processes. Furthermore, we present an algorithm for learning the support of excitation matrix (or equivalently the DIG). The performance of the algorithm is evaluated on synthesized multivariate Hawkes networks as well as a stock market and MemeTracker real-world dataset.


Efficient Neighborhood Selection for Gaussian Graphical Models

arXiv.org Machine Learning

This paper addresses the problem of neighborhood selection for Gaussian graphical models. We present two heuristic algorithms: a forward-backward greedy algorithm for general Gaussian graphical models based on mutual information test, and a threshold-based algorithm for walk summable Gaussian graphical models. Both algorithms are shown to be structurally consistent, and efficient. Numerical results show that both algorithms work very well.