AITopics | Provo

Abstract--Recent work has demonstrated the utility of Random Forest (RF) proximities for various supervised machine learning tasks, including outlier detection, missing data imputation, and visualization. However, the utility of the RF proximities depends upon the success of the RF model, which itself is not the ideal model in all contexts. RF proximities have recently been extended to time series by means of the distance-based Proximity Forest (PF) model, among others, affording time series analysis with the benefits of RF proximities. In this work, we introduce the generalized PF model, thereby extending RF proximities to all contexts in which supervised distance-based machine learning can occur . Additionally, we introduce a variant of the PF model for regression tasks. We also introduce the notion of using the generalized PF model as a meta-learning framework, extending supervised imputation capability to any pre-trained classifier . We experimentally demonstrate the unique advantages of the generalized PF model compared with both the RF model and the k-nearest neighbors model.

imputation, pf model, proximity, (16 more...)

arXiv.org Machine Learning

2511.19487

Country:

North America > United States > Utah > Cache County > Logan (0.14)
North America > United States > Utah > Utah County > Provo (0.05)
Asia > Philippines (0.04)
Antarctica (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

ColdGANs: Taming Language GANs with Cautious Sampling Strategies Supplementary Material

Neural Information Processing SystemsOct-9-2025, 15:33:41 GMT

We used a single RTX 2080 Ti GPU. While T5-small underperforms its larger version, T5-11B, the latter has 11 billion parameters. Gabon and South Africa, are ranked 119th and 121st, respectively . ANSWER: 119th HUMAN: What is Gabon ' s ranking? ColdGAN: What is Gabon ' s rank on the HDI?

coldgan, iniesta, vineyard, (11 more...)

Neural Information Processing Systems

Country:

Africa > Gabon (0.67)
Africa > South Africa (0.25)
North America > United States > New York (0.09)
(4 more...)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.72)
Information Technology > Communications > Social Media (0.56)

Add feedback

Localized Uncertainty Quantification in Random Forests via Proximities

Rhodes, Jake S., Brown, Scott D., Wilkinson, J. Riley

arXiv.org Machine LearningSep-30-2025

Abstract--In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. While current methods often rely on quantile regression or Monte Carlo techniques, we propose a new approach using naturally occurring test sets and similarity measures (proximities) typically viewed as byproducts of random forests. Specifically, we form localized distributions of OOB errors around nearby points, defined using the proximities, to create prediction intervals for regression and trust scores for classification. By varying the number of nearby points, our intervals can be adjusted to achieve the desired coverage while retaining the flexibility that reflects the certainty of individual predictions. For classification, excluding points identified as unclassifiable by our method generally enhances the accuracy of the model and provides higher accuracy-rejection AUC scores than competing methods. Although traditional machine learning models usually provide point estimates, there is growing recognition of the need to incorporate uncertainty to support more informed decisions [1]. By quantifying uncertainty, users can assess the reliability of model outputs and better interpret results, especially for out-of-distribution samples through calibrated confidence estimates.

prediction, prediction interval, proximity, (17 more...)

arXiv.org Machine Learning

2509.22928

Country:

North America > United States > Utah > Utah County > Provo (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification

Rhodes, Jake S., Rustad, Adam G., Maia, Sofia Pelagalli, Thacker, Evan, Choi, Hyunmi, Gutierrez, Jose, Rundek, Tatjana, Shaw, Ben

arXiv.org Machine LearningSep-30-2025

Missing data is a common problem in time series data. Most methods for imputation ignore label information pertaining to the time series even if that information exists. In this paper, we provide a framework for missing data imputation in the context of time series classification, where each time series is associated with a categorical label. We define a means of imputing missing values conditional upon labels, the method being guided by powerful, existing supervised models designed for high accuracy in this task. From each model, we extract a tree-based proximity measure from which imputation can be applied. We show that imputation using this method generally provides richer information leading to higher classification accuracies, despite the imputed values differing from the true values.

classification, imputation, time sery, (15 more...)

arXiv.org Machine Learning

2509.22919

Country:

North America > United States > Utah > Utah County > Provo (0.05)
North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Guided Manifold Alignment with Geometry-Regularized Twin Autoencoders

Rhodes, Jake S., Rustad, Adam G., Nielsen, Marshall S., McClellan, Morgan Chase, Gardner, Dallan, Hedges, Dawson

arXiv.org Machine LearningSep-30-2025

Abstract--Manifold alignment (MA) involves a set of techniques for learning shared representations across domains, yet many traditional MA methods are incapable of performing out-of-sample extension, limiting their real-world applicability. We propose a guided representation learning framework leveraging a geometry-regularized twin autoencoder (AE) architecture to enhance MA while enabling generalization to unseen data. Our method enforces structured cross-modal mappings to maintain geometric fidelity in learned embeddings. By incorporating a pre-trained alignment model and a multitask learning formulation, we improve cross-domain generalization and representation robustness while maintaining alignment fidelity. We evaluate our approach using several MA methods, showing improvements in embedding consistency, information preservation, and cross-domain transfer . Additionally, we apply our framework to Alzheimer's disease diagnosis, demonstrating its ability to integrate multi-modal patient data and enhance predictive accuracy in cases limited to a single domain by leveraging insights from the multi-modal problem. Manifold learning encompasses a set of methods used to create a lower-dimensional representation, or an embedding, of higher-dimensional data. Such representations can form a key role in data visualization [1]-[5], dimensionality reduction as a preprocessing step for subsequent machine-learning or analytical tasks [6], or serve as a denoising mechanism [4]. In the context of multi-domain problems, where multiple types of data are considered, manifold learning becomes more challenging as data distributions across different domains or modalities may exhibit domain-specific variations while still sharing a common geometric structure. Manifold alignment (MA) seeks to address this problem. In some contexts, a common, shared representation of multi-modal data can be viewed as a natural extension of manifold learning. For example, cell samples of the same type but collected at a different time or using different methodologies should still share features in common, but differences in the measured features may occur due to batch effects [7], obscuring the similarities.

alignment, correspondence, representation, (14 more...)

arXiv.org Machine Learning

2509.22913

Country:

North America > United States > California (0.14)
North America > United States > Utah > Utah County > Provo (0.05)
Europe > Switzerland (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

What will be Tyler Robinson's defense strategy? Experts weigh in on accused Charlie Kirk assassin

FOX NewsSep-29-2025, 04:00:41 GMT

Legal experts analyze the challenging defense strategy for Tyler Robinson, who allegedly shot Charlie Kirk at Utah Valley University, as prosecutors prepare evidence for trial.

charlie kirk, robinson, tyler robinson, (11 more...)

FOX News

Country: