Goto

Collaborating Authors

 copula


Quasi-Bayes meets Vines

Neural Information Processing Systems

Recently developed quasi-Bayesian (QB) methods \cite{fong2023martingale} proposed a stimulating change of paradigm in Bayesian computation by directly constructing the Bayesian predictive distribution through recursion, removing the need for expensive computations involved in sampling the Bayesian posterior distribution. This has proved to be data-efficient for univariate predictions, however, existing constructions for higher dimensional densities are only possible by relying on restrictive assumptions on the model's multivariate structure. Here, we propose a wholly different approach to extend Quasi-Bayesian prediction to high dimensions through the use of Sklar's theorem, by decomposing the predictive distribution into one-dimensional predictive marginals and a high-dimensional copula. We use the efficient recursive QB construction for the one-dimensional marginals and model the dependence using highly expressive vine copulas. Further, we tune hyperparameters using robust divergences (eg.


Copula Multi-label Learning

Neural Information Processing Systems

A formidable challenge in multi-label learning is to model the interdependencies between labels and features. Unfortunately, the statistical properties of existing multi-label dependency modelings are still not well understood. Copulas are a powerful tool for modeling dependence of multivariate data, and achieve great success in a wide range of applications, such as finance, econometrics and systems neuroscience. This inspires us to develop a novel copula multi-label learning paradigm for modeling label and feature dependencies. The copula based paradigm enables to reveal new statistical insights in multi-label learning. In particular, the paper first leverages the kernel trick to construct continuous distribution in the output space, and then estimates our proposed model semiparametrically where the copula is modeled parametrically, while the marginal distributions are modeled nonparametrically. Theoretically, we show that our estimator is an unbiased and consistent estimator and follows asymptotically a normal distribution. Moreover, we bound the mean squared error of estimator. The experimental results from various domains validate the superiority of our proposed approach.


Deep Archimedean Copulas

Neural Information Processing Systems

A central problem in machine learning and statistics is to model joint densities of random variables from data. Copulas are joint cumulative distribution functions with uniform marginal distributions and are used to capture interdependencies in isolation from marginals. Copulas are widely used within statistics, but have not gained traction in the context of modern deep learning. In this paper, we introduce ACNet, a novel differentiable neural network architecture that enforces structural properties and enables one to learn an important class of copulas--Archimedean Copulas. Unlike Generative Adversarial Networks, Variational Autoencoders, or Normalizing Flow methods, which learn either densities or the generative process directly, ACNet learns a generator of the copula, which implicitly defines the cumulative distribution function of a joint distribution. We give a probabilistic interpretation of the network parameters of ACNet and use this to derive a simple but efficient sampling algorithm for the learned copula. Our experiments show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.


Trunc-Opt vine building algorithms

Pfeifer, Dániel, Kovács, Edith Alice

arXiv.org Machine Learning

Vine copula models have become highly popular and practical tools for modelling multivariate probability distributions due to their flexibility in modelling different kinds of dependences between the random variables involved. However, their flexibility comes with the drawback of a high-dimensional parameter space. To tackle this problem, truncated vine copulas were introduced by Kurowicka (2010) (Gaussian case) and Brechmann and Czado (2013) (general case). Truncated vine copulas contain conditionally independent pair copulas after the truncation level. So far, in the general case, truncated vine constructing algorithms started from the lowest tree in order to encode the largest dependences in the lower trees. The novelty of this paper starts from the observation that a truncated vine is determined by the first tree after the truncation level (see Kovács and Szántai (2017)). This paper introduces a new score for fitting truncated vines to given data, called the Weight of the truncated vine. Then we propose a completely new methodology for constructing truncated vines. We prove theorems which motivate this new approach. While earlier algorithms did not use conditional independences, we give algorithms for constructing and encoding truncated vines which do exploit them. Finally, we illustrate the algorithms on real datasets and compare the results with well-known methods included in R packages. Our method generally compare favorably to previously known methods.


Towards a pretrained deep learning estimator of the Linfoot informational correlation

Berg, Stéphanie M. van den, Halekoh, Ulrich, Möller, Sören, Jensen, Andreas Kryger, Hjelmborg, Jacob von Bornemann

arXiv.org Machine Learning

We develop a supervised deep-learning approach to estimate mutual information between two continuous random variables. As labels, we use the Linfoot informational correlation, a transformation of mutual information that has many important properties. Our method is based on ground truth labels for Gaussian and Clayton copulas. We compare our method with estimators based on kernel density, k-nearest neighbours and neural estimators. We show generally lower bias and lower variance. As a proof of principle, future research could look into training the model with a more diverse set of examples from other copulas for which ground truth labels are available.


Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification

Aich, Agnideep, Hewage, Sameera, Murshed, Md Monzur

arXiv.org Machine Learning

Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, especially at the extremes. Using the METABRIC breast cancer cohort, we studied whether directly modeling the joint relationship between clinical and genomic machine learning risk scores could improve risk stratification for 5-year cancer-specific mortality. We created a binary 5-year cancer-death outcome and defined two sets of predictors: a clinical set (demographic, tumor, and treatment variables) and a genomic set (gene-expression $z$-scores). We trained several supervised classifiers, such as Random Forest and XGBoost, and used 5-fold cross-validated predicted probabilities as unbiased risk scores. These scores were converted to pseudo-observations on $(0,1)^2$ to fit Gaussian, Clayton, and Gumbel copulas. Clinical models showed good discrimination (AUC 0.783), while genomic models had moderate performance (AUC 0.681). The joint distribution was best captured by a Gaussian copula (bootstrap $p=0.997$), which suggests a symmetric, moderately strong positive relationship. When we grouped patients based on this relationship, Kaplan-Meier curves showed clear differences: patients who were high-risk in both clinical and genomic scores had much poorer survival than those high-risk in only one set. These results show that copula-based fusion works in real-world cohorts and that considering dependencies between scores can better identify patient subgroups with the worst prognosis.


Mixed vine copulas as joint models of spike counts and local field potentials

Arno Onken, Stefano Panzeri

Neural Information Processing Systems

Concurrent measurements of neural activity at multiple scales, sometimes performed with multimodal techniques, become increasingly important for studying brain function. However, statistical methods for their concurrent analysis are currently lacking.



Copula Multi-label Learning

Weiwei Liu

Neural Information Processing Systems

Unfortunately, the statistical properties of existing multi-label dependency modelings are still not well understood.