AITopics | rf-phate

Collaborating Authors

rf-phate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Random Forest Autoencoders for Guided Representation Learning

Neural Information Processing SystemsJun-19-2026, 12:06:16 GMT

Extensive research has produced robust methods for unsupervised data visualization. Yet supervised visualization--where expert labels guide representations--remains underexplored, as most supervised approaches prioritize classification over visualization. Recently, RF-PHATE, a diffusion-based manifold learning method leveraging random forests and information geometry, marked significant progress in supervised visualization.

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government (0.67)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Random Forest Autoencoders for Guided Representation Learning

Aumon, Adrien, Ni, Shuang, Lizotte, Myriam, Wolf, Guy, Moon, Kevin R., Rhodes, Jake S.

arXiv.org Artificial IntelligenceMar-16-2025

Decades of research have produced robust methods for unsupervised data visualization, yet supervised visualization$\unicode{x2013}$where expert labels guide representations$\unicode{x2013}$remains underexplored, as most supervised approaches prioritize classification over visualization. Recently, RF-PHATE, a diffusion-based manifold learning method leveraging random forests and information geometry, marked significant progress in supervised visualization. However, its lack of an explicit mapping function limits scalability and prevents application to unseen data, posing challenges for large datasets and label-scarce scenarios. To overcome these limitations, we introduce Random Forest Autoencoders (RF-AE), a neural network-based framework for out-of-sample kernel extension that combines the flexibility of autoencoders with the supervised learning strengths of random forests and the geometry captured by RF-PHATE. RF-AE enables efficient out-of-sample supervised visualization and outperforms existing methods, including RF-PHATE's standard kernel extension, in both accuracy and interpretability. Additionally, RF-AE is robust to the choice of hyper-parameters and generalizes to any kernel-based dimensionality reduction method.

artificial intelligence, machine learning, random forest autoencoder, (15 more...)

arXiv.org Artificial Intelligence

2502.13257

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
South America > Brazil (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Education (0.88)
Health & Medicine > Therapeutic Area > Immunology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Can an unsupervised clustering algorithm reproduce a categorization system?

Castellanos, Nathalia, Desai, Dhruv, Frank, Sebastian, Pasquali, Stefano, Mehta, Dhagash

arXiv.org Machine LearningAug-19-2024

Peer analysis is a critical component of investment management, often relying on expert-provided categorization systems. These systems' consistency is questioned when they do not align with cohorts from unsupervised clustering algorithms optimized for various metrics. We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset, showing that success depends on feature selection and the chosen distance metric. Using toy datasets and fund categorization as real-world examples we demonstrate that accurately reproducing ground truth classes is challenging. We also highlight the limitations of standard clustering evaluation metrics in identifying the optimal number of clusters relative to the ground truth classes. We then show that if appropriate features are available in the dataset, and a proper distance metric is known (e.g., using a supervised Random Forest-based distance metric learning method), then an unsupervised clustering can indeed reproduce the ground truth classes as distinct clusters.

distance metric, evaluation metric, k-means, (12 more...)

arXiv.org Machine Learning

2408.1034

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension

Ni, Shuang, Aumon, Adrien, Wolf, Guy, Moon, Kevin R., Rhodes, Jake S.

arXiv.org Machine LearningJun-6-2024

The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Common dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.

architecture, proximity, rf-phate, (15 more...)

arXiv.org Machine Learning

2406.04421

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Utah (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Supervised Visualization for Data Exploration

Rhodes, Jake S., Cutler, Adele, Wolf, Guy, Moon, Kevin R.

arXiv.org Machine LearningJun-15-2020

Dimensionality reduction is often used as an initial step in data exploration, either as preprocessing for classification or regression or for visualization. Most dimensionality reduction techniques to date are unsupervised; they do not take class labels into account (e.g., PCA, MDS, t-SNE, Isomap). Such methods require large amounts of data and are often sensitive to noise that may obfuscate important patterns in the data. Various attempts at supervised dimensionality reduction methods that take into account auxiliary annotations (e.g., class labels) have been successfully implemented with goals of increased classification accuracy or improved data visualization. Many of these supervised techniques incorporate labels in the loss function in the form of similarity or dissimilarity matrices, thereby creating over-emphasized separation between class clusters, which does not realistically represent the local and global relationships in the data. In addition, these approaches are often sensitive to parameter tuning, which may be difficult to configure without an explicit quantitative notion of visual superiority. In this paper, we describe a novel supervised visualization technique based on random forest proximities and diffusion-based dimensionality reduction. We show, both qualitatively and quantitatively, the advantages of our approach in retaining local and global structures in data, while emphasizing important variables in the low-dimensional embedding. Importantly, our approach is robust to noise and parameter tuning, thus making it simple to use while producing reliable visualizations for data exploration.

data mining, machine learning, rf-phate, (12 more...)

arXiv.org Machine Learning

2006.08701

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Utah (0.04)
North America > United States > Iowa > Story County > Ames (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback