AITopics | class proportion

Collaborating Authors

class proportion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Neural Information Processing SystemsJun-16-2026, 13:59:30 GMT

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Neural Information Processing SystemsJun-11-2026, 18:47:06 GMT

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Weakly-Supervised Domain Adaptation with Proportion-Constrained Pseudo-Labeling

Okuo, Takumi, Matsuo, Shinnosuke, Harada, Shota, Tanaka, Kiyohito, Bise, Ryoma

arXiv.org Artificial IntelligenceJun-30-2025

--Domain shift is a significant challenge in machine learning, particularly in medical applications where data distributions differ across institutions due to variations in data collection practices, equipment, and procedures. This can degrade performance when models trained on source domain data are applied to the target domain. Domain adaptation methods have been widely studied to address this issue, but most struggle when class proportions between the source and target domains differ . In this paper, we propose a weakly-supervised domain adaptation method that leverages class proportion information from the target domain, which is often accessible in medical datasets through prior knowledge or statistical reports. Our method assigns pseudo-labels to the unlabeled target data based on class proportion (called proportion-constrained pseudo-labeling), improving performance without the need for additional annotations. Experiments on two endoscopic datasets demonstrate that our method outperforms semi-supervised domain adaptation techniques, even when 5% of the target domain is labeled. Additionally, the experimental results with noisy proportion labels highlight the robustness of our method, further demonstrating its effectiveness in real-world application scenarios. Domain shift is an important issue in machine learning in which the distribution of data in the target domain (test data in practice) differs from that in the source domain (training data).

artificial intelligence, machine learning, proportion, (14 more...)

arXiv.org Artificial Intelligence

2506.22301

Country: Asia > Japan (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Learning from Similarity Proportion Loss for Classifying Skeletal Muscle Recovery Stages

Yamaoka, Yu, Chan, Weng Ian, Seno, Shigeto, Fukada, Soichiro, Matsuda, Hideo

arXiv.org Artificial IntelligenceMay-9-2025

Evaluating the regeneration process of damaged muscle tissue is a fundamental analysis in muscle research to measure experimental effect sizes and uncover mechanisms behind muscle weakness due to aging and disease. The conventional approach to assessing muscle tissue regeneration involves whole-slide imaging and expert visual inspection of the recovery stages based on the morphological information of cells and fibers. There is a need to replace these tasks with automated methods incorporating machine learning techniques to ensure a quantitative and objective analysis. Given the limited availability of fully labeled data, a possible approach is Learning from Label Proportions (LLP), a weakly supervised learning method using class label proportions. However, current LLP methods have two limitations: (1) they cannot adapt the feature extractor for muscle tissues, and (2) they treat the classes representing recovery stages and cell morphological changes as nominal, resulting in the loss of ordinal information. To address these issues, we propose Ordinal Scale Learning from Similarity Proportion (OSLSP), which uses a similarity proportion loss derived from two bag combinations. OSLSP can update the feature extractor by using class proportion attention to the ordinal scale of the class. Our model with OSLSP outperforms large-scale pre-trained and fine-tuning models in classification tasks of skeletal muscle recovery stages.

artificial intelligence, machine learning, proportion, (13 more...)

arXiv.org Artificial Intelligence

2505.0415

Country: Asia > Japan (0.15)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

FILM: Framework for Imbalanced Learning Machines based on a new unbiased performance measure and a new ensemble-based technique

Guillén-Teruel, Antonio, Caracena, Marcos, Pardo, Jose A., de-la-Gándara, Fernando, Palma, José, Botía, Juan A.

arXiv.org Artificial IntelligenceMar-6-2025

This research addresses the challenges of handling unbalanced datasets for binary classification tasks. In such scenarios, standard evaluation metrics are often biased by the disproportionate representation of the minority class. Conducting experiments across seven datasets, we uncovered inconsistencies in evaluation metrics when determining the model that outperforms others for each binary classification problem. This justifies the need for a metric that provides a more consistent and unbiased evaluation across unbalanced datasets, thereby supporting robust model selection. To mitigate this problem, we propose a novel metric, the Unbiased Integration Coefficients (UIC), which exhibits significantly reduced bias ($p < 10^{-4}$) towards the minority class compared to conventional metrics. The UIC is constructed by aggregating existing metrics while penalising those more prone to imbalance. In addition, we introduce the Identical Partitions for Imbalance Problems (IPIP) algorithm for imbalanced ML problems, an ensemble-based approach. Our experimental results show that IPIP outperforms other baseline imbalance-aware approaches using Random Forest and Logistic Regression models in three out of seven datasets as assessed by the UIC metric, demonstrating its effectiveness in addressing imbalanced data challenges in binary classification tasks. This new framework for dealing with imbalanced datasets is materialized in the FILM (Framework for Imbalanced Learning Machines) R Package, accessible at https://github.com/antoniogt/FILM.

algorithm, dataset, minority class, (16 more...)

arXiv.org Artificial Intelligence

2503.0437

Country:

Europe > Spain > Region of Murcia > Murcia (0.04)
Europe > United Kingdom (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.94)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

Addressing Domain Shift via Imbalance-Aware Domain Adaptation in Embryo Development Assessment

Li, Lei, Zhang, Xinglin, Liang, Jun, Chen, Tao

arXiv.org Artificial IntelligenceJan-8-2025

Deep learning models in medical imaging face dual challenges: domain shift, where models perform poorly when deployed in settings different from their training environment, and class imbalance, where certain disease conditions are naturally underrepresented. We present Imbalance-Aware Domain Adaptation (IADA), a novel framework that simultaneously tackles both challenges through three key components: (1) adaptive feature learning with class-specific attention mechanisms, (2) balanced domain alignment with dynamic weighting, and (3) adaptive threshold optimization. Our theoretical analysis establishes convergence guarantees and complexity bounds. Through extensive experiments on embryo development assessment across four imaging modalities, IADA demonstrates significant improvements over existing methods, achieving up to 25.19\% higher accuracy while maintaining balanced performance across classes. In challenging scenarios with low-quality imaging systems, IADA shows robust generalization with AUC improvements of up to 12.56\%. These results demonstrate IADA's potential for developing reliable and equitable medical imaging systems for diverse clinical settings. The code is made public available at \url{https://github.com/yinghemedical/imbalance-aware_domain_adaptation}

artificial intelligence, class imbalance, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.04958

Country:

Europe (0.46)
North America > United States (0.28)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.71)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Venkatasubramanian, Shyam, Tarokh, Vahid

arXiv.org Machine LearningDec-20-2024

Accelerating model convergence in resource-constrained environments is essential for fast and efficient neural network training. This work presents learn2mix, a new training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Empirical evaluations on benchmark datasets show that neural networks trained with learn2mix converge faster than those trained with classical approaches, achieving improved results for classification, regression, and reconstruction tasks under limited training resources and with imbalanced classes. Our empirical findings are supported by theoretical analysis. Despite their ability to learn and model complex, nonlinear relationships, deep neural networks often require substantial computational resources during training. In resource-constrained environments, this demand poses a significant challenge (Goyal et al., 2017), making the development of efficient and scalable training methodologies increasingly crucial to fully leverage the capabilities of deep neural networks. Training deep neural networks relies on the notion of empirical risk minimization (Vapnik & Bottou, 1993), and typically involves optimizing a loss function using gradient-based algorithms (Rumelhart et al., 1986; Bottou, 2010; Kingma & Ba, 2014). Techniques such as regularization (Srivastava et al., 2014; Ioffe & Szegedy, 2015) and data augmentation (Shorten & Khoshgoftaar, 2019), learning rate scheduling, (Smith, 2017) and early stopping (Prechelt, 1998), are commonly employed to enhance generalization and prevent overfitting. However, the efficiency of the training process itself remains a critical concern, particularly in terms of convergence speed and computational resources. Within this context, adaptive training strategies, which target enhanced generalization by modifying aspects of the training process, have emerged as promising approaches.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2412.16482

Country:

North America > United States > California (0.05)
North America > United States > Oregon > Multnomah County > Portland (0.04)

Genre: Research Report (1.00)

Industry:

Government > Military (0.67)
Government > Regional Government > North America Government > United States Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sample Size in Natural Language Processing within Healthcare Research

Chaturvedi, Jaya, Shamsutdinova, Diana, Zimmer, Felix, Velupillai, Sumithra, Stahl, Daniel, Stewart, Robert, Roberts, Angus

arXiv.org Artificial IntelligenceSep-5-2023

Sample size calculation is an essential step in most data-based disciplines. Large enough samples ensure representativeness of the population and determine the precision of estimates. This is true for most quantitative studies, including those that employ machine learning methods, such as natural language processing, where free-text is used to generate predictions and classify instances of text. Within the healthcare domain, the lack of sufficient corpora of previously collected data can be a limiting factor when determining sample sizes for new studies. This paper tries to address the issue by making recommendations on sample sizes for text classification tasks in the healthcare domain. Models trained on the MIMIC-III database of critical care records from Beth Israel Deaconess Medical Center were used to classify documents as having or not having Unspecified Essential Hypertension, the most common diagnosis code in the database. Simulations were performed using various classifiers on different sample sizes and class proportions. This was repeated for a comparatively less common diagnosis code within the database of diabetes mellitus without mention of complication. Smaller sample sizes resulted in better results when using a K-nearest neighbours classifier, whereas larger sample sizes provided better results with support vector machines and BERT models. Overall, a sample size larger than 1000 was sufficient to provide decent performance metrics. The simulations conducted within this study provide guidelines that can be used as recommendations for selecting appropriate sample sizes and class proportions, and for predicting expected performance, when building classifiers for textual healthcare data. The methodology used here can be modified for sample size estimates calculations with other datasets.

class proportion, diagnosis code, sample size, (13 more...)

arXiv.org Artificial Intelligence

2309.02237

Country:

Asia > Middle East > Israel (0.24)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Nephrology (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Enhancing Clinical Predictive Modeling through Model Complexity-Driven Class Proportion Tuning for Class Imbalanced Data: An Empirical Study on Opioid Overdose Prediction

Liu, Yinan, Dong, Xinyu, Lyu, Weimin, Rosenthal, Richard N., Wong, Rachel, Ma, Tengfei, Wang, Fusheng

arXiv.org Artificial IntelligenceMay-9-2023

Class imbalance problems widely exist in the medical field and heavily deteriorates performance of clinical predictive models. Most techniques to alleviate the problem rebalance class proportions and they predominantly assume the rebalanced proportions should be a function of the original data and oblivious to the model one uses. This work challenges this prevailing assumption and proposes that links the optimal class proportions to the model complexity, thereby tuning the class proportions per model. Our experiments on the opioid overdose prediction problem highlights the performance gain of tuning class proportions. Rigorous regression analysis also confirms the advantages of the theoretical framework proposed and the statistically significant correlation between the hyperparameters controlling the model complexity and the optimal class proportions. Introduction The class imbalance problem often occurs in medical machine learning because "positive" patients generally comprise a small fraction of the total population.

artificial intelligence, machine learning, model complexity, (15 more...)

arXiv.org Artificial Intelligence

2305.05722

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Asia > Indonesia > Java > Yogyakarta > Yogyakarta (0.04)

Genre: Research Report > New Finding (0.90)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Generative and Discriminative Learning with Unknown Labeling Bias

Neural Information Processing SystemsApr-6-2023, 14:24:00 GMT

We apply robust Bayesian decision theory to improve both generative and discriminative learners under bias in class proportions in labeled training data, when the true class proportions are unknown. For the generative case, we derive an entropy-based weighting that maximizes expected log likelihood under the worst-case true class proportions. For the discriminative case, we derive a multinomial logistic model that minimizes worst-case conditional log loss. We apply our theory to the modeling of species geographic distributions from presence data, an extreme case of label bias since there is no absence data. On a benchmark dataset, we find that entropy-based weighting offers an improvement over constant estimates of class proportions, consistently reducing log loss on unbiased test data.

class proportion, generative and discriminative learning, true class proportion, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback