From Global to Local Correlation: Geometric Decomposition of Statistical Inference

Nov-20-2025–arXiv.org Machine Learning

Understanding feature-outcome associations in high-dimensional data remains challenging when relationships vary across subpopulations, yet standard methods assuming global associations miss context-dependent patterns, reducing statistical power and interpretability. We develop a geometric decomposition framework offering two strategies for partitioning inference problems into regional analyses on data-derived Riemannian graphs. Gradient flow decomposition uses path-monotonicity-validated discrete Morse theory to partition samples into gradient flow cells where outcomes exhibit monotonic behavior. Co-monotonicity decomposition utilizes vertex-level coefficients that provide context-dependent versions of the classical Pearson correlation: these coefficients measure edge-based directional concordance between outcome and features, or between feature pairs, defining embeddings of samples into association space. These embeddings induce Riemannian k-NN graphs on which biclustering identifies co-monotonicity cells (coherent regions) and feature modules. This extends naturally to multi-modal integration across multiple feature sets. Both strategies apply independently or jointly, with Bayesian posterior sampling providing credible intervals.

data mining, machine learning, vertex, (18 more...)

arXiv.org Machine Learning

Nov-20-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Hungary
  - Budapest > Budapest (0.04)
- North America > United States
  - Maryland (0.04)
  - New Jersey > Mercer County
    - Princeton (0.04)
  - New York (0.04)
  - Rhode Island > Providence County
    - Providence (0.04)
  - Virginia (0.04)

Genre:
- Research Report
  - Experimental Study (0.46)
  - New Finding (0.67)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Performance Analysis > Accuracy (0.88)
      - Statistical Learning (1.00)
    - Representation & Reasoning > Uncertainty (0.93)
  - Data Science > Data Mining (1.00)