Goto

Collaborating Authors

 significance level





grangersearch: An R Package for Exhaustive Granger Causality Testing with Tidyverse Integration

Korfiatis, Nikolaos

arXiv.org Machine Learning

Understanding causal relationships between time series variables is a fundamental problem in economics, finance, neuroscience, and many other fields. While true causality is philosophically complex and difficult to establish from observational data alone, Granger (1969) proposed a practical, testable notion of causality based on predictability: a variable X is said to "Granger-cause" another variable Y if past values of X contain information that helps predict Y beyond what is contained in past values of Y alone. Granger causality testing has found applications across diverse domains. In macroeconomics, Sims (1972) famously applied the technique to study money-income relationships, while Kraft and Kraft (1978) pioneered its use in energy economics. Financial market researchers including Hiemstra and Jones (1994) have extended the methodology to study price-volume dynamics, and neuroscientists have adapted Granger causality for brain connectivity analysis (Seth, Barrett, and Barnett 2015). The statistical foundations rest on vector autoregressive (V AR) models (Sims 1980), with comprehensive treatments available in Lütkepohl (2005) and discussions of causal interpretation in Peters, Janzing, and Schölkopf (2017). Despite its popularity, implementing Granger causality tests in R (R Core Team 2024) remains cumbersome for applied researchers.


Efficient and scalable clustering of survival curves

Villanueva, Nora M., Sestelo, Marta, Meira-Machado, Luis

arXiv.org Machine Learning

Survival analysis encompasses a broad range of methods for analyzing time-to-event data, with one key objective being the comparison of survival curves across groups. Traditional approaches for identifying clusters of survival curves often rely on computationally intensive bootstrap techniques to approximate the null hypothesis distribution. While effective, these methods impose significant computational burdens. In this work, we propose a novel approach that leverages the k-means and log-rank test to efficiently identify and cluster survival curves. Our method eliminates the need for computationally expensive resampling, significantly reducing processing time while maintaining statistical reliability. By systematically evaluating survival curves and determining optimal clusters, the proposed method ensures a practical and scalable alternative for large-scale survival data analysis. Through simulation studies, we demonstrate that our approach achieves results comparable to existing bootstrap-based clustering methods while dramatically improving computational efficiency. These findings suggest that the log-rank-based clustering procedure offers a viable and time-efficient solution for researchers working with multiple survival curves in medical and epidemiological studies.


Multivariate tests of association based on univariate tests

Ruth Heller, Yair Heller

Neural Information Processing Systems

For testing two vector random variables for independence, we propose testing whether the distance of one vector from an arbitrary center point is independent from the distance of the other vector from another arbitrary center point by a univariate test. We prove that under minimal assumptions, it is enough to have a consistent univariate independence test on the distances, to guarantee that the power to detect dependence between the random vectors increases to one with sample size. If the univariate test is distribution-free, the multivariate test will also be distribution-free.


Reliable Classification with Conformal Learning and Interval-Type 2 Fuzzy Sets

Fumanal-Idocin, Javier, Andreu-Perez, Javier

arXiv.org Artificial Intelligence

Classical machine learning classifiers tend to be overconfident can be unreliable outside of the laboratory benchmarks. Properly assessing the reliability of the output of the model per sample is instrumental for real-life scenarios where these systems are deployed. Because of this, different techniques have been employed to properly quantify the quality of prediction for a given model. These are most commonly Bayesian statistics and, more recently, conformal learning. Given a calibration set, conformal learning can produce outputs that are guaranteed to cover the target class with a desired significance level, and are more reliable than the standard confidence intervals used by Bayesian methods. In this work, we propose to use conformal learning with fuzzy rule-based systems in classification and show some metrics of their performance. Then, we discuss how the use of type 2 fuzzy sets can improve the quality of the output of the system compared to both fuzzy and crisp rules. Finally, we also discuss how the fine-tuning of the system can be adapted to improve the quality of the conformal prediction.



A Conformal Prediction Framework for Uncertainty Quantification in Physics-Informed Neural Networks

Yu, Yifan, Ho, Cheuk Hin, Wang, Yangshuai

arXiv.org Artificial Intelligence

Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving PDEs, yet existing uncertainty quantification (UQ) approaches for PINNs generally lack rigorous statistical guarantees. This framework calibrates prediction intervals by constructing nonconformity scores on a calibration set, thereby yielding distribution-free uncertainty estimates with rigorous finite-sample coverage guarantees for PINNs. To handle spatial het-eroskedasticity, we further introduce local conformal quantile estimation, enabling spatially adaptive uncertainty bands while preserving theoretical guarantee. Through systematic evaluations on typical PDEs (damped harmonic oscillator, Poisson, Allen-Cahn, and Helmholtz equations) and comprehensive testing across multiple uncertainty metrics, our results demonstrate that the proposed framework achieves reliable calibration and locally adaptive uncertainty intervals, consistently outperforming heuristic UQ approaches. By bridging PINNs with distribution-free UQ, this work introduces a general framework that not only enhances calibration and reliability, but also opens new avenues for uncertainty-aware modeling of complex PDE systems.1. Introduction Physics-Informed Neural Networks (PINNs) have emerged as a versatile framework for solving partial differential equations (PDEs) by embedding physical laws into neural network training [1, 2]. Numerous variants have been developed to enhance accuracy, efficiency, and applicability [3, 4, 5, 6, 7, 8], enabling PINNs to address complex geometries [9, 10], high-dimensional and multiscale problems [11, 12, 13], and inverse formulations [14, 15] within a unified mesh-free paradigm. Applications span fluid mechanics [16, 17], heat transfer [18, 19], and materials science [20, 21]; see [16, 22, 23, 24, 25] for comprehensive reviews.


Optimal Conformal Prediction, E-values, Fuzzy Prediction Sets and Subsequent Decisions

Koning, Nick W., van Meer, Sam

arXiv.org Machine Learning

We make three contributions to conformal prediction. First, we propose fuzzy conformal confidence sets that offer a degree of exclusion, generalizing beyond the binary inclusion/exclusion offered by classical confidence sets. We connect fuzzy confidence sets to e-values to show this degree of exclusion is equivalent to an exclusion at different confidence levels, capturing precisely what e-values bring to conformal prediction. We show that a fuzzy confidence set is a predictive distribution with a more appropriate error guarantee. Second, we derive optimal conformal confidence sets by interpreting the minimization of the expected measure of the confidence set as an optimal testing problem against a particular alternative. We use this to characterize exactly in what sense traditional conformal prediction is optimal. Third, we generalize the inheritance of guarantees by subsequent minimax decisions from confidence sets to fuzzy confidence sets. All our results generalize beyond the exchangeable conformal setting to prediction sets for arbitrary models. In particular, we find that any valid test (e-value) for a hypothesis automatically defines a (fuzzy) prediction confidence set.