AITopics | random partition

We review recently proposed Bayesian approaches for clustering high-dimensional data. After identifying the main limitations of available approaches, we introduce an alternative framework based on vertical consensus inference (VCI) to mitigate the curse of dimensionality in high-dimensional Bayesian clustering. VCI builds on the idea of consensus Monte Carlo by dividing the data into multiple shards (smaller subsets of variables), performing posterior inference on each shard, and then combining the shard-level posteriors to obtain a consensus posterior. The key distinction is that VCI splits the data vertically, producing vertical shards that retain the same number of observations but have lower dimensionality. We use an entropic regularized Wasserstein barycenter to define a consensus posterior. The shard-specific barycenter weights are constructed to favor shards that provide meaningful partitions, distinct from a trivial single cluster or all singleton clusters, favoring balanced cluster sizes and precise shard-specific posterior random partitions. We show that VCI can be interpreted as a variational approximation to the posterior under a hierarchical model with a generalized Bayes prior. For relatively low-dimensional problems, experiments suggest that VCI closely approximates inference based on clustering the entire multivariate data. For high-dimensional data and in the presence of many noninformative dimensions, VCI introduces a new framework for model-based and principled inference on random partitions. Although our focus here is on random partitions, VCI can be applied to any dimension-independent parameters and serves as a bridge to emerging areas in statistics such as consensus Monte Carlo, optimal transport, variational inference, and generalized Bayes.

data mining, machine learning, posterior, (19 more...)

arXiv.org Machine Learning

2603.27864

Country:

North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (0.68)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Data Science > Data Mining (0.88)

Add feedback

Representation Learning via Consistent Assignment of Views over Random Partitions

Neural Information Processing SystemsFeb-15-2026, 10:42:37 GMT

CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments.

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Brazil (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Add feedback

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Neural Information Processing SystemsDec-27-2025, 15:04:12 GMT

The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.

beta-negative binomial process, mixed-membership modeling, random partition, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.79)

Add feedback

Representation Learning via Consistent Assignment of Views over Random Partitions

Neural Information Processing SystemsDec-26-2025, 05:08:02 GMT

CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments. Additionally, our method improves training stability and prevents collapsed solutions in joint-embedding training. Through an extensive evaluation, we demonstrate that CARP's representations are suitable for learning downstream tasks. We evaluate CARP's representations capabilities in 17 datasets across many standard protocols, including linear evaluation, few-shot classification, $k$-NN, $k$-means, image retrieval, and copy detection. We compare CARP performance to 11 existing self-supervised methods. We extensively ablate our method and demonstrate that our proposed random partition pretext task improves the quality of the learned representations by devising multiple random classification tasks.In transfer learning tasks, CARP achieves the best performance on average against many SSL methods trained for a longer time.

consistent assignment, random partition, representation learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Add feedback

An RKHS Perspective on Tree Ensembles

Dagdoug, Mehdi, Dombry, Clement, Duchamps, Jean-Jil

arXiv.org Machine LearningDec-2-2025

Random Forests and Gradient Boosting are among the most effective algorithms for supervised learning on tabular data. Both belong to the class of tree-based ensemble methods, where predictions are obtained by aggregating many randomized regression trees. In this paper, we develop a theoretical framework for analyzing such methods through Reproducing Kernel Hilbert Spaces (RKHSs) constructed on tree ensembles--more precisely, on the random partitions generated by randomized regression trees. We establish fundamental analytical properties of the resulting Random Forest kernel, including boundedness, continuity, and universality, and show that a Random Forest predictor can be characterized as the unique minimizer of a penalized empirical risk functional in this RKHS, providing a variational interpretation of ensemble learning. We further extend this perspective to the continuous-time formulation of Gradient Boosting introduced by Dombry and Duchamps (2024a,b), and demonstrate that it corresponds to a gradient flow on a Hilbert manifold induced by the Random Forest RKHS. A key feature of this framework is that both the kernel and the RKHS geometry are data-dependent, offering a theoretical explanation for the strong empirical performance of tree-based ensembles. Finally, we illustrate the practical potential of this approach by introducing a kernel principal component analysis built on the Random Forest kernel, which enhances the interpretability of ensemble models, as well as GVI, a new geometric variable importance criterion.

kernel, partition, rkh perspective, (16 more...)

arXiv.org Machine Learning

2512.00397

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France (0.04)
(4 more...)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Flexible Models for Microclustering with Application to Entity Resolution

Brenda Betancourt, Giacomo Zanella, Jeffrey W. Miller, Hanna Wallach, Abbas Zaidi, Beka Steorts

Neural Information Processing SystemsNov-21-2025, 06:47:43 GMT

However, for some applications, this assumption is inappropriate.

information retrieval, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Syria (0.14)
North America > United States (0.14)
Europe > Italy (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Industry:

Government (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.44)

Add feedback

7caf9d251b546bc78078b35b4a6f3b7e-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 23:14:59 GMT

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Brazil (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(3 more...)

Add feedback

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Neural Information Processing SystemsSep-30-2025, 08:18:27 GMT

The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.

beta-negative binomial process, mixed-membership modeling, random partition, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.79)

Add feedback

Large-scale entity resolution via microclustering Ewens--Pitman random partitions

Beraha, Mario, Favaro, Stefano

arXiv.org Machine LearningJul-25-2025

We introduce the microclustering Ewens--Pitman model for random partitions, obtained by scaling the strength parameter of the Ewens--Pitman model linearly with the sample size. The resulting random partition is shown to have the microclustering property, namely: the size of the largest cluster grows sub-linearly with the sample size, while the number of clusters grows linearly. By leveraging the interplay between the Ewens--Pitman random partition with the Pitman--Yor process, we develop efficient variational inference schemes for posterior computation in entity resolution. Our approach achieves a speed-up of three orders of magnitude over existing Bayesian methods for entity resolution, while maintaining competitive empirical performance.

information retrieval, machine learning, random partition, (21 more...)

arXiv.org Machine Learning

2507.18101

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Predictive Coresets

Flores, Bernardo

arXiv.org Artificial IntelligenceFeb-8-2025

We propose a construction of coresets based on a predictive view of Bayesian posterior inference (Fong et al., 2024; Fortini and Petrone, 2012). The main attraction of the approach is the model-agnostic nature - the method is valid with any inference model and independent of the specific inference goals, making it highly adaptable for a wide range of applications. Such adaptability is particularly valuable in the context of large-scale datasets, now commonplace in fields like genomics and astronomy. While this explosion of data offers incredible opportunities for discoveries, it also brings significant computational challenges. Tasks that were once straightforward, such as evaluating likelihoods several times have become increasingly difficult, making traditional data processing methods impractical. These obstacles have frequently pushed practitioners toward simpler statistical models that might not capture the full complexity of the data, disregarding expressiveness and flexibility that rich hierarchical and nonparametric models can offer.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.05725

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.65)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

Filters

Collaborating Authors

random partition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Vertical Consensus Inference for High-Dimensional Random Partition

Representation Learning via Consistent Assignment of Views over Random Partitions

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Representation Learning via Consistent Assignment of Views over Random Partitions

An RKHS Perspective on Tree Ensembles

Flexible Models for Microclustering with Application to Entity Resolution

7caf9d251b546bc78078b35b4a6f3b7e-Paper-Conference.pdf

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Large-scale entity resolution via microclustering Ewens--Pitman random partitions

Predictive Coresets