Goto

Collaborating Authors

 suborder


Gini Score under Ties and Case Weights

Brauer, Alexej, Wüthrich, Mario V.

arXiv.org Machine Learning

The Gini score is a popular statistical tool in model validation. The Gini score has originally been introduced and used for binary responses Y {0, 1}, and there are many equivalent formulations of the (binary) Gini score such as the receiver operating curve (ROC) and the area under the curve (AUC); see, e.g., [Bamber (1975)], [Hanley-McNeil (1982)] and [Fawcett (2006)]. These different formulations are also equivalent to the Wilcoxon-Mann-Whitney's U statistic, see [Hanley-McNeil (1982)], [DeLong et al. (1988)], [Byrne (2016)], and to [Somers (1962)]'s D, see [Newson (2002)]. Thus, there are at least five equivalent formulations of the Gini score in a binary context, and there is a broad literature on its behavior which is well understood. When it comes to general real-valued responses, things become more difficult, and definitions and results on the Gini score are mainly found in the credit risk and actuarial literature. In this stream of literature, the Gini score has been introduced by [Gourieroux-Jasiak (2007)], [Frees et al. (2011), Frees et al. (2013)]. Furthermore, in the real-valued setting the Gini score is studied in much detail in [Denuit et al. (2019)] and [Denuit-Trufin (2021)]. The Gini score is a statistic that assesses whether a given risk ranking is correct.


Exact discovery is polynomial for sparse causal Bayesian networks

Rios, Felix L., Moffa, Giusi, Kuipers, Jack

arXiv.org Machine Learning

Causal Bayesian networks are widely used tools for summarising the dependencies between variables and elucidating their putative causal relationships. Learning networks from data is computationally hard in general. The current state-of-the-art approaches for exact causal discovery are integer linear programming over the underlying space of directed acyclic graphs, dynamic programming and shortest-path searches over the space of topological orders, and constraint programming combining both. For dynamic programming over orders, the computational complexity is known to be exponential base 2 in the number of variables in the network. We demonstrate how to use properties of Bayesian networks to prune the search space and lower the computational cost, while still guaranteeing exact discovery. When including new path-search and divide-and-conquer criteria, we prove optimality in quadratic time for matchings, and polynomial time for any network class with logarithmically-bound largest connected components. In simulation studies we observe the polynomial dependence for sparse networks and that, beyond some critical value, the logarithm of the base grows with the network density. Our approach then out-competes the state-of-the-art at lower densities. These results therefore pave the way for faster exact causal discovery in larger and sparser networks.


Primates have evolved larger voice boxes than other mammals to help with social interactions

Daily Mail - Science & tech

Humans and other primates have evolved'significantly larger' voice boxes than other mammals to help with social interactions, a new study shows. Compared with other mammals such as cats, the voice box, or larynx, of primates such as gorillas and chimpanzees is more than a third larger in relation to their body size. They also found that primates' voice boxes undergo faster rates of evolution, and are diverse in function and more variable in size. Researchers made CT-scans of specimens from 55 different species, including primates and other mammals, and produced 3D computer models of their larynges. The research claims to be the first large-scale study into the evolution of the larynx, where tissue vibrations produce sounds for vocal communication.