AITopics

arXiv.org Artificial IntelligenceMar-31-2025

Entropy-Based Adaptive Weighting for Self-Training

Wang, Xiaoxuan, Deng, Yihe, Ma, Mingyu Derek, Wang, Wei

The mathematical problem-solving capabilities of large language models have become a focal point of research, with growing interests in leveraging self-generated reasoning paths as a promising way to refine and enhance these models. These paths capture step-by-step logical processes while requiring only the correct answer for supervision. The self-training method has been shown to be effective in reasoning tasks while eliminating the need for external models and manual annotations. However, optimizing the use of self-generated data for model training remains an open challenge. In this work, we propose Entropy-Based Adaptive Weighting for Self-Training (EAST), an adaptive weighting strategy designed to prioritize uncertain data during self-training. Specifically, EAST employs a mapping function with a tunable parameter that controls the sharpness of the weighting, assigning higher weights to data where the model exhibits greater uncertainty. This approach guides the model to focus on more informative and challenging examples, thereby enhancing its reasoning ability. We evaluate our approach on GSM8K and MATH benchmarks. Empirical results show that, while the vanilla method yields virtually no improvement (0%) on MATH, EAST achieves around a 1% gain over backbone model. On GSM8K, EAST attains a further 1-2% performance boost compared to the vanilla method.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2503.23913

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

LeCoz, Adrien, Herbin, Stéphane, Adjed, Faouzi

Explaining an image classifier with a generative model conditioned by uncertainty

arXiv.org Artificial IntelligenceNov-5-2024

Identifying sources of uncertainty in an image classifier is a crucial challenge. Indeed, the decision process of those models is opaque and does not necessarily correspond to what we might expect. To help characterize classifiers, generative models can be used as they allow the control of visual attributes. Here we use a generative adversarial network to generate images corresponding to how a classifier sees the image. More specifically, we consider the classifier maximum softmax probability as an uncertainty estimation and use it as an additional input to condition the generative model. This allows us to generate images that result in uncertain predictions, giving us a global view of which images are harder to classify. We can also increase the uncertainty of a given image and observe the impact of an attribute, providing a more local understanding of the decision process. We perform experiments on the MNIST dataset, augmented with corruptions. We believe that generative models are a helpful tool to explain the behavior and uncertainties of image classifiers.

classifier, generative model, generator, (16 more...)

2410.13871

Country:

Europe > France (0.14)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

arXiv.org Artificial IntelligenceMay-28-2024

Learning from Uncertain Data: From Possible Worlds to Possible Models

Zhu, Jiongli, Feng, Su, Glavic, Boris, Salimi, Babak

abstract transformer, uncertain data, zonotope, (16 more...)

2405.18549

Country:

North America > United States > Virginia > Alexandria County > Alexandria (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

arXiv.org Machine LearningDec-13-2023

Training of Neural Networks with Uncertain Data, A Mixture of Experts Approach

Luttner, Lucas

This paper presents the "Uncertainty-aware Mixture of Experts" (uMoE), a novel approach designed to address aleatoric uncertainty in the training of predictive models based on Neural Networks (NNs). While existing methods primarily focus on managing uncertainty during infer-ence, uMoE integrates uncertainty directly into the train-ing process. The uMoE approach adopts a "Divide and Conquer" paradigm to partition the uncertain input space into more manageable subspaces. It consists of Expert components, each trained solely on the portion of input uncertainty corresponding to their subspace. On top of the Experts, a Gating Unit, guided by additional infor-mation about the distribution of uncertain inputs across these subspaces, learns to weight the Experts to minimize deviations from the ground truth. Our results highlight that uMoE significantly outperforms baseline methods in handling data uncertainty. Furthermore, we conducted a robustness analysis, illustrating its capability to adapt to varying levels of uncertainty and suggesting optimal threshold parameters. This innovative approach holds wide applicability across diverse data-driven domains, in-cluding biomedical signal processing, autonomous driv-ing, and production quality control.

evaluation, gating unit, subspace, (17 more...)

doi: 10.5281/zenodo.10050097

2312.08083

Country:

North America > United States > California (0.04)
Europe > United Kingdom > Wales (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.54)
Overview > Innovation (0.54)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Machine LearningDec-1-2023

Target-agnostic Source-free Domain Adaptation for Regression Tasks

He, Tianlang, Xia, Zhiqiu, Chen, Jierun, Li, Haoliang, Chan, S. -H. Gary

Unsupervised domain adaptation (UDA) seeks to bridge the domain gap between the target and source using unlabeled target data. Source-free UDA removes the requirement for labeled source data at the target to preserve data privacy and storage. However, work on source-free UDA assumes knowledge of domain gap distribution, and hence is limited to either target-aware or classification task. To overcome it, we propose TASFAR, a novel target-agnostic source-free domain adaptation approach for regression tasks. Using prediction confidence, TASFAR estimates a label density map as the target label distribution, which is then used to calibrate the source model on the target domain. We have conducted extensive experiments on four regression tasks with various domain gaps, namely, pedestrian dead reckoning for different users, image-based people counting in different scenes, housing-price prediction at different districts, and taxi-trip duration prediction from different departure points. TASFAR is shown to substantially outperform the state-of-the-art source-free UDA approaches by averagely reducing 22% errors for the four tasks and achieve notably comparable accuracy as source-based UDA without using source data.

artificial intelligence, information management, machine learning, (17 more...)

2312.0054

Country:

North America > United States > New York (0.05)
North America > United States > California (0.05)
Asia > China > Hong Kong (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Peng, Shen, Canessa, Gianpiero, Allen-Zhao, Zhihua

Chance constrained conic-segmentation support vector machine with uncertain data

arXiv.org Artificial IntelligenceSep-22-2022

In classification problems, a classifier is a function that mimics the relationship between the data vectors and their class labels. Support vector machine(SVM) is a popular classifier, which was proposed by Cortes and Vapnik [1] as a maximum margin classifier. The success of the SVM has encouraged further research into extensions to the more general multiclass cases, which has been an active topic of research interest [2-4]. Shilton et al.[5] proposed the conicsegmentation support vector machine (CS-SVM) by introducing the concept of target space into the problem formulation and showed that some other multiclassfication model are special cases of this framework. The standard CS-SVM is dealing with the situation where the exact values of the data points are known.

artificial intelligence, conic-segmentation support vector machine, machine learning, (14 more...)

2107.13319

Country:

North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Okuno, Akifumi, Hattori, Kohei

A Greedy and Optimistic Approach to Clustering with a Specified Uncertainty of Covariates

arXiv.org Machine LearningApr-18-2022

In this study, we examine a clustering problem in which the covariates of each individual element in a dataset are associated with an uncertainty specific to that element. More specifically, we consider a clustering approach in which a pre-processing applying a non-linear transformation to the covariates is used to capture the hidden data structure. To this end, we approximate the sets representing the propagated uncertainty for the pre-processed features empirically. To exploit the empirical uncertainty sets, we propose a greedy and optimistic clustering (GOC) algorithm that finds better feature candidates over such sets, yielding more condensed clusters. As an important application, we apply the GOC algorithm to synthetic datasets of the orbital properties of stars generated through our numerical simulation mimicking the formation process of the Milky Way. The GOC algorithm demonstrates an improved performance in finding sibling stars originating from the same dwarf galaxy. These realistic datasets have also been made publicly available.

artificial intelligence, data mining, machine learning, (19 more...)

2204.08205

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Adesunkanmi, Rahmat, Kumar, Ratnesh

Noise-robust Clustering

arXiv.org Machine LearningOct-19-2021

This paper presents noise-robust clustering techniques in unsupervised machine learning. The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics. As a result, data quality, cleansing, management, and governance remain critical disciplines when working with Big Data. With this complexity, it is no longer sufficient to treat data deterministically as in a classical setting, and it becomes meaningful to account for noise distribution and its impact on data sample values. Classical clustering methods group data into "similarity classes" depending on their relative distances or similarities in the underlying space. This paper addressed this problem via the extension of classical $K$-means and $K$-medoids clustering over data distributions (rather than the raw data). This involves measuring distances among distributions using two types of measures: the optimal mass transport (also called Wasserstein distance, denoted $W_2$) and a novel distance measure proposed in this paper, the expected value of random variable distance (denoted ED). The presented distribution-based $K$-means and $K$-medoids algorithms cluster the data distributions first and then assign each raw data to the cluster of data's distribution.

k-means, k-means and k-medoid, k-medoid, (16 more...)

2110.08871

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado (0.04)
North America > United States > Iowa > Story County > Ames (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Cohen, Izack, Gal, Avigdor

Uncertain Process Data with Probabilistic Knowledge: Problem Characterization and Challenges

arXiv.org Artificial IntelligenceJun-6-2021

Motivated by the abundance of uncertain event data from multiple sources including physical devices and sensors, this paper presents the task of relating a stochastic process observation to a process model that can be rendered from a dataset. In contrast to previous research that suggested to transform a stochastically known event log into a less informative uncertain log with upper and lower bounds on activity frequencies, we consider the challenge of accommodating the probabilistic knowledge into conformance checking techniques. Based on a taxonomy that captures the spectrum of conformance checking cases under stochastic process observations, we present three types of challenging cases. The first includes conformance checking of a stochastically known log with respect to a given process model. The second case extends the first to classify a stochastically known log into one of several process models. The third case extends the two previous ones into settings in which process models are only stochastically known. The suggested problem captures the increasingly growing number of applications in which sensors provide probabilistic process information.

conformance checking, event log, process model, (15 more...)

2106.03324

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)