Goto

Collaborating Authors

 mallow


Interval Fisher's Discriminant Analysis and Visualisation

Pinheiro, Diogo, Oliveira, M. Rosário, Kravchenko, Igor, Oliveira, Lina

arXiv.org Machine Learning

In Data Science, entities are typically represented by single valued measurements. Symbolic Data Analysis extends this framework to more complex structures, such as intervals and histograms, that express internal variability. We propose an extension of multiclass Fisher's Discriminant Analysis to interval-valued data, using Moore's interval arithmetic and the Mallows' distance. Fisher's objective function is generalised to consider simultaneously the contributions of the centres and the ranges of intervals and is numerically maximised. The resulting discriminant directions are then used to classify interval-valued observations.To support visual assessment, we adapt the class map, originally introduced for conventional data, to classifiers that assign labels through minimum distance rules. We also extend the silhouette plot to this setting and use stacked mosaic plots to complement the visual display of class assignments. Together, these graphical tools provide insight into classifier performance and the strength of class membership. Applications to real datasets illustrate the proposed methodology and demonstrate its value in interpreting classification results for interval-valued data.


Scalable branch-and-bound model selection with non-monotonic criteria including AIC, BIC and Mallows's $\mathit{C_p}$

Vanhoefer, Jakob, Körner, Antonia, Doresic, Domagoj, Hasenauer, Jan, Pathirana, Dilan

arXiv.org Machine Learning

Model selection is a pivotal process in the quantitative sciences, where researchers must navigate between numerous candidate models of varying complexity. Traditional information criteria, such as the corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), and Mallows's $\mathit{C_p}$, are valuable tools for identifying optimal models. However, the exponential increase in candidate models with each additional model parameter renders the evaluation of these criteria for all models -- a strategy known as exhaustive, or brute-force, searches -- computationally prohibitive. Consequently, heuristic approaches like stepwise regression are commonly employed, albeit without guarantees of finding the globally-optimal model. In this study, we challenge the prevailing notion that non-monotonicity in information criteria precludes bounds on the search space. We introduce a simple but novel bound that enables the development of branch-and-bound algorithms tailored for these non-monotonic functions. We demonstrate that our approach guarantees identification of the optimal model(s) across diverse model classes, sizes, and applications, often with orders of magnitude computational speedups. For instance, in one previously-published model selection task involving $2^{32}$ (approximately 4 billion) candidate models, our method achieves a computational speedup exceeding 6,000. These findings have broad implications for the scalability and effectiveness of model selection in complex scientific domains.


Foundation's new season has dramatic potential – but sadly falls flat

New Scientist

Mel Brooks and Carl Reiner used to spend every evening watching movies. Their favourites were cheesy – the type of film where someone says, "Secure the perimeter!" Why do I mention this in the context of Foundation? Because this adaptation of Isaac Asimov's novels started out as a thought-provoking series, but is now a "Secure the perimeter!" It has been two years since Foundation last aired, so if you have forgotten where we left off, that is understandable.


Learning Distributions over Permutations and Rankings with Factorized Representations

Severo, Daniel, Karrer, Brian, Nolte, Niklas

arXiv.org Artificial Intelligence

Learning distributions over permutations is a fundamental problem in machine learning, with applications in ranking, combinatorial optimization, structured prediction, and data association. Existing methods rely on mixtures of parametric families or neural networks with expensive variational inference procedures. In this work, we propose a novel approach that leverages alternative representations for permutations, including Lehmer codes, Fisher-Yates draws, and Insertion-Vectors. These representations form a bijection with the symmetric group, allowing for unconstrained learning using conventional deep learning techniques, and can represent any probability distribution over permutations. Our approach enables a trade-off between expressivity of the model family and computational requirements. In the least expressive and most computationally efficient case, our method subsumes previous families of well established probabilistic models over permutations, including Mallow's and the Repeated Insertion Model. Experiments indicate our method significantly outperforms current approaches on the jigsaw puzzle benchmark, a common task for permutation learning. However, we argue this benchmark is limited in its ability to assess learning probability distributions, as the target is a delta distribution (i.e., a single correct solution exists). We therefore propose two additional benchmarks: learning cyclic permutations and re-ranking movies based on user preference. We show that our method learns non-trivial distributions even in the least expressive mode, while traditional models fail to even generate valid permutations in this setting.


Distribution Matching for Self-Supervised Transfer Learning

Jiao, Yuling, Ma, Wensen, Sun, Defeng, Wang, Hansheng, Wang, Yang

arXiv.org Machine Learning

In this paper, we propose a novel self-supervised transfer learning method called Distribution Matching (DM), which drives the representation distribution toward a predefined reference distribution while preserving augmentation invariance. The design of DM results in a learned representation space that is intuitively structured and offers easily interpretable hyperparameters. Experimental results across multiple real-world datasets and evaluation metrics demonstrate that DM performs competitively on target classification tasks compared to existing self-supervised transfer learning methods. Additionally, we provide robust theoretical guarantees for DM, including a population theorem and an end-to-end sample theorem. The population theorem bridges the gap between the self-supervised learning task and target classification accuracy, while the sample theorem shows that, even with a limited number of samples from the target domain, DM can deliver exceptional classification performance, provided the unlabeled sample size is sufficiently large.


Review for NeurIPS paper: The Smoothed Possibility of Social Choice

Neural Information Processing Systems

Additional Feedback: Major comments: 1) Smoothed analysis: Calling your analysis "smoothed analysis" is a bit confusing. Smoothed analysis would take worst-case over profiles, and then expectation over noise added. But as you say in your related work, you're taking worst-case over distributions coming from a family. Such analysis was already done in the past. For example, consider work that analyzed the probability of Condorcet's paradox under any distribution from the Mallows family.


Fairness in Ranking: Robustness through Randomization without the Protected Attribute

Kliachkin, Andrii, Psaroudaki, Eleni, Marecek, Jakub, Fotakis, Dimitris

arXiv.org Artificial Intelligence

There has been great interest in fairness in machine learning, especially in relation to classification problems. In ranking-related problems, such as in online advertising, recommender systems, and HR automation, much work on fairness remains to be done. Two complications arise: first, the protected attribute may not be available in many applications. Second, there are multiple measures of fairness of rankings, and optimization-based methods utilizing a single measure of fairness of rankings may produce rankings that are unfair with respect to other measures. In this work, we propose a randomized method for post-processing rankings, which do not require the availability of the protected attribute. In an extensive numerical study, we show the robustness of our methods with respect to P-Fairness and effectiveness with respect to Normalized Discounted Cumulative Gain (NDCG) from the baseline ranking, improving on previously proposed methods.


Information criteria for structured parameter selection in high dimensional tree and graph models

Jansen, Maarten

arXiv.org Artificial Intelligence

Parameter selection in high-dimensional models is typically finetuned in a way that keeps the (relative) number of false positives under control. This is because otherwise the few true positives may be dominated by the many possible false positives. This happens, for instance, when the selection follows from a naive optimisation of an information criterion, such as AIC or Mallows's Cp. It can be argued that the overestimation of the selection comes from the optimisation process itself changing the statistics of the selected variables, in a way that the information criterion no longer reflects the true divergence between the selection and the data generating process. In lasso, the overestimation can also be linked to the shrinkage estimator, which makes the selection too tolerant of false positive selections. For these reasons, this paper works on refined information criteria, carefully balancing false positives and false negatives, for use with estimators without shrinkage. In particular, the paper develops corrected Mallows's Cp criteria for structured selection in trees and graphical models.


R Programming: Selection of variables

#artificialintelligence

The all-possible-regressions procedure considers all possible subsets of the pool of potential explanatory variables Xi (with i 1, 2, …, m). It then identifies a small group of regression models which are "good" according to a specified criterion. A detailed examination of these models can lead to the selection of the final model. If there are m candidate explanatory variables: 2 m regressions for all possible subsets (e.g. if m 10, then there are 1024 possible regression models) The function leaps() (from package leaps) performs an exhaustive search for the best subsets of the explanatory variables for predicting the response variable in linear regression. This gave us a little idea but still, we are not sure how many parameters to be used.


A Unified Statistical Learning Model for Rankings and Scores with Application to Grant Panel Review

Pearce, Michael, Erosheva, Elena A.

arXiv.org Machine Learning

Rankings and scores are two common data types used by judges to express preferences and/or perceptions of quality in a collection of objects. Numerous models exist to study data of each type separately, but no unified statistical model captures both data types simultaneously without first performing data conversion. We propose the Mallows-Binomial model to close this gap, which combines a Mallows' φ ranking model with Binomial score models through shared parameters that quantify object quality, a consensus ranking, and the level of consensus between judges. We propose an efficient tree-search algorithm to calculate the exact MLE of model parameters, study statistical properties of the model both analytically and through simulation, and apply our model to real data from an instance of grant panel review that collected both scores and partial rankings. Furthermore, we demonstrate how model outputs can be used to rank objects with confidence. The proposed model is shown to sensibly combine information from both scores and rankings to quantify object quality and measure consensus with appropriate levels of statistical uncertainty. Keywords: preference learning, score and ranking aggregation, Mallows' model, A* algorithm, peer review