A method for classification of data with uncertainty using hypothesis testing

Feb-12-2025–arXiv.org Artificial Intelligence

Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that belong to overlapping regions of the two class distributions or for data outside the distributions (out-of-distribution data). Therefore, conventional classifiers should not be applied in high-risk fields where classification results can have significant consequences. In order to address this issue, it is necessary to quantify uncertainty and adopt decision-making approaches that take it into account. Many methods have been proposed for this purpose; however, implementing these methods often requires performing resampling, improving the structure or performance of models, and optimizing the thresholds of classifiers. We propose a new decision-making approach using two types of hypothesis testing. This method is capable of detecting ambiguous data that belong to the overlapping regions of two class distributions, as well as out-of-distribution data that are not included in the training data distribution. In addition, we quantify uncertainty using the empirical distribution of feature values derived from the training data obtained through the trained model. The classification threshold is determined by the $\alpha$-quantile and ($1-\alpha$)-quantile, where the significance level $\alpha$ is set according to each specific situation.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Feb-12-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.14)
  - California > Los Angeles County
    - Long Beach (0.04)
- Europe > France
  - Hauts-de-France > Nord > Lille (0.04)
- Asia > Japan
  - Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.05)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine
  - Diagnostic Medicine > Imaging (0.47)
  - Therapeutic Area > Infections and Infectious Diseases (0.31)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found