AITopics | prevalence value

Collaborating Authors

prevalence value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analysis of Diagnostics (Part II): Prevalence, Linear Independence, and Unsupervised Learning

Patrone, Paul N., Binder, Raquel A., Forconi, Catherine S., Moormann, Ann M., Kearsley, Anthony J.

arXiv.org Artificial IntelligenceAug-28-2024

This is the second manuscript in a two-part series that uses diagnostic testing to understand the connection between prevalence (i.e. number of elements in a class), uncertainty quantification (UQ), and classification theory. Part I considered the context of supervised machine learning (ML) and established a duality between prevalence and the concept of relative conditional probability. The key idea of that analysis was to train a family of discriminative classifiers by minimizing a sum of prevalence-weighted empirical risk functions. The resulting outputs can be interpreted as relative probability level-sets, which thereby yield uncertainty estimates in the class labels. This procedure also demonstrated that certain discriminative and generative ML models are equivalent. Part II considers the extent to which these results can be extended to tasks in unsupervised learning through recourse to ideas in linear algebra. We first observe that the distribution of an impure population, for which the class of a corresponding sample is unknown, can be parameterized in terms of a prevalence. This motivates us to introduce the concept of linearly independent populations, which have different but unknown prevalence values. Using this, we identify an isomorphism between classifiers defined in terms of impure and pure populations. In certain cases, this also leads to a nonlinear system of equations whose solution yields the prevalence values of the linearly independent populations, fully realizing unsupervised learning as a generalization of supervised learning. We illustrate our methods in the context of synthetic data and a research-use-only SARS-CoV-2 enzyme-linked immunosorbent assay (ELISA).

classifier, linearly independent population, training data, (15 more...)

arXiv.org Artificial Intelligence

2408.16035

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Regularization-Based Methods for Ordinal Quantification

Bunse, Mirko, Moreo, Alejandro, Sebastiani, Fabrizio, Senz, Martin

arXiv.org Artificial IntelligenceOct-13-2023

Quantification, i.e., the task of training predictors of the class prevalence values in sets of unlabeled data items, has received increased attention in recent years. However, most quantification research has concentrated on developing algorithms for binary and multiclass problems in which the classes are not ordered. Here, we study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes. We give three main contributions to this field. First, we create and make available two datasets for ordinal quantification (OQ) research that overcome the inadequacies of the previously available ones. Second, we experimentally compare the most important OQ algorithms proposed in the literature so far. To this end, we bring together algorithms proposed by authors from very different research fields, such as data mining and astrophysics, who were unaware of each others' developments. Third, we propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments. The key to this gain in performance is that our regularization prevents ordinally implausible estimates, assuming that ordinal distributions tend to be smooth in practice. We informally verify this assumption for several real-world applications.

algorithm, dataset, quantification method, (15 more...)

arXiv.org Artificial Intelligence

2310.0921

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry: Government (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications (0.93)

Add feedback

Multi-Label Quantification

Moreo, Alejandro, Francisco, Manuel, Sebastiani, Fabrizio

arXiv.org Artificial IntelligenceNov-15-2022

Quantification, variously called "supervised prevalence estimation" or "learning to quantify", is the supervised learning task of generating predictors of the relative frequencies (a.k.a. "prevalence values") of the classes of interest in unlabelled data samples. While many quantification methods have been proposed in the past for binary problems and, to a lesser extent, single-label multiclass problems, the multi-label setting (i.e., the scenario in which the classes of interest are not mutually exclusive) remains by and large unexplored. A straightforward solution to the multi-label quantification problem could simply consist of recasting the problem as a set of independent binary quantification problems. Such a solution is simple but na\"ive, since the independence assumption upon which it rests is, in most cases, not satisfied. In these cases, knowing the relative frequency of one class could be of help in determining the prevalence of other related classes. We propose the first truly multi-label quantification methods, i.e., methods for inferring estimators of class prevalence values that strive to leverage the stochastic dependencies among the classes of interest in order to predict their relative frequencies more accurately. We show empirical evidence that natively multi-label solutions outperform the na\"ive approaches by a large margin. The code to reproduce all our experiments is available online.

classifier, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2211.08063

Country:

Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (0.46)
Government (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

QuaPy: A Python-Based Framework for Quantification

Moreo, Alejandro, Esuli, Andrea, Sebastiani, Fabrizio

arXiv.org Artificial IntelligenceJun-18-2021

QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification is the task of training quantifiers via supervised learning, where a quantifier is a predictor that estimates the relative frequencies (a.k.a. prevalence values) of the classes of interest in a sample of unlabelled data. While quantification can be trivially performed by applying a standard classifier to each unlabelled data item and counting how many data items have been assigned to each class, it has been shown that this "classify and count" method is outperformed by methods specifically designed for quantification. QuaPy provides implementations of a number of baseline methods and advanced quantification methods, of routines for quantification-oriented model selection, of several broadly accepted evaluation measures, and of robust evaluation protocols routinely used in the field. QuaPy also makes available datasets commonly used for testing quantifiers, and offers visualization tools for facilitating the analysis and interpretation of the results. The software is open-source and publicly available under a BSD-3 licence via https://github.com/HLT-ISTI/QuaPy, and can be installed via pip (https://pypi.org/project/QuaPy/)

dataset, prevalence value, quantifier, (14 more...)

arXiv.org Artificial Intelligence

2106.11057

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Re-Assessing the "Classify and Count" Quantification Method

Moreo, Alejandro, Sebastiani, Fabrizio

arXiv.org Artificial IntelligenceNov-4-2020

Learning to quantify (a.k.a.\ quantification) is a task concerned with training unbiased estimators of class prevalence via supervised learning. This task originated with the observation that "Classify and Count" (CC), the trivial method of obtaining class prevalence estimates, is often a biased estimator, and thus delivers suboptimal quantification accuracy; following this observation, several methods for learning to quantify have been proposed that have been shown to outperform CC. In this work we contend that previous works have failed to use properly optimised versions of CC. We thus reassess the real merits of CC (and its variants), and argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy once (a) hyperparameter optimisation is performed, and (b) this optimisation is performed by using a true quantification loss instead of a standard classification-based loss. Experiments on three publicly available binary sentiment classification datasets support these conclusions.

classifier, optimisation, quantification method, (15 more...)

arXiv.org Artificial Intelligence

2011.02552

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Data-Driven Malaria Prevalence Prediction in Large Densely-Populated Urban Holoendemic sub-Saharan West Africa: Harnessing Machine Learning Approaches and 22-years of Prospectively Collected Data

Brown, Biobele J., Przybylski, Alexander A., Manescu, Petru, Caccioli, Fabio, Oyinloye, Gbeminiyi, Elmi, Muna, Shaw, Michael J., Pawar, Vijay, Claveau, Remy, Shawe-Taylor, John, Srinivasan, Mandayam A., Afolabi, Nathaniel K., Orimadegun, Adebola E., Ajetunmobi, Wasiu A., Akinkunmi, Francis, Kowobari, Olayinka, Osinusi, Kikelomo, Akinbami, Felix O., Omokhodion, Samuel, Shokunbi, Wuraola A., Lagunju, Ikeoluwa, Sodeinde, Olugbemiro, Fernandez-Reyes, Delmiro

arXiv.org Machine LearningJun-18-2019

Plasmodium falciparum malaria still poses one of the greatest threats to human life with over 200 million cases globally leading to half-million deaths annually. Of these, 90% of cases and of the mortality occurs in sub-Saharan Africa, mostly among children. Although malaria prediction systems are central to the 2016-2030 malaria Global Technical Strategy, currently these are inadequate at capturing and estimating the burden of disease in highly endemic countries. We developed and validated a computational system that exploits the predictive power of current Machine Learning approaches on 22-years of prospective data from the high-transmission holoendemic malaria urban-densely-populated sub-Saharan West-Africa metropolis of Ibadan. Our dataset of >9x104 screened study participants attending our clinical and community services from 1996 to 2017 contains monthly prevalence, temporal, environmental and host features. Our Locality-specific Elastic-Net based Malaria Prediction System (LEMPS) achieves good generalization performance, both in magnitude and direction of the prediction, when tasked to predict monthly prevalence on previously unseen validation data (MAE<=6x10-2, MSE<=7x10-3) within a range of (+0.1 to -0.05) error-tolerance which is relevant and usable for aiding decision-support in a holoendemic setting. LEMPS is well-suited for malaria prediction, where there are multiple features which are correlated with one another, and trading-off between regularization-strength L1-norm and L2-norm allows the system to retain stability. Data-driven systems are critical for regionally-adaptable surveillance, management of control strategies and resource allocation across stretched healthcare systems.

artificial intelligence, machine learning, prevalence, (17 more...)

arXiv.org Machine Learning

1906.07502

Country:

Africa > West Africa (0.61)
Africa > Nigeria > Oyo State > Ibadan (0.30)
Africa > Sub-Saharan Africa (0.24)
(2 more...)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback