High-dimensional prediction for count response via sparse exponential weights

Oct-20-2024–arXiv.org Machine Learning

Count data is prevalent in various fields like ecology, medical research, and genomics. In high-dimensional settings, where the number of features exceeds the sample size, feature selection becomes essential. While frequentist methods like Lasso have advanced in handling high-dimensional count data, Bayesian approaches remain under-explored with no theoretical results on prediction performance. This paper introduces a novel probabilistic machine learning framework for high-dimensional count data prediction. We propose a pseudo-Bayesian method that integrates a scaled Student prior to promote sparsity and uses an exponential weight aggregation procedure. A key contribution is a novel risk measure tailored to count data prediction, with theoretical guarantees for prediction risk using PAC-Bayesian bounds. Our results include non-asymptotic oracle inequalities, demonstrating rate-optimal prediction error without prior knowledge of sparsity. We implement this approach efficiently using Langevin Monte Carlo method. Simulations and a real data application highlight the strong performance of our method compared to the Lasso in various settings.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Machine Learning

Oct-20-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
- Europe
  - France (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
    - Oxfordshire > Oxford (0.04)
  - Norway > Central Norway
    - Trøndelag > Trondheim (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (1.00)