Goto

Collaborating Authors

 ranking vector


Ranking Vectors Clustering: Theory and Applications

Fattahi, Ali, Eshragh, Ali, Aslani, Babak, Rabiee, Meysam

arXiv.org Artificial Intelligence

We study the problem of clustering ranking vectors, where each vector represents preferences as an ordered list of distinct integers. Specifically, we focus on the k-centroids ranking vectors clustering problem (KRC), which aims to partition a set of ranking vectors into k clusters and identify the centroid of each cluster. Unlike classical k-means clustering (KMC), KRC constrains both the observations and centroids to be ranking vectors. We establish the NP-hardness of KRC and characterize its feasible set. For the single-cluster case, we derive a closed-form analytical solution for the optimal centroid, which can be computed in linear time. To address the computational challenges of KRC, we develop an efficient approximation algorithm, KRCA, which iteratively refines initial solutions from KMC, referred to as the baseline solution. Additionally, we introduce a branch-and-bound (BnB) algorithm for efficient cluster reconstruction within KRCA, leveraging a decision tree framework to reduce computational time while incorporating a controlling parameter to balance solution quality and efficiency. We establish theoretical error bounds for KRCA and BnB. Through extensive numerical experiments on synthetic and real-world datasets, we demonstrate that KRCA consistently outperforms baseline solutions, delivering significant improvements in solution quality with fast computational times. This work highlights the practical significance of KRC for personalization and large-scale decision making, offering methodological advancements and insights that can be built upon in future studies.


Establishing a leader in a pairwise comparisons method

Szybowski, Jacek, Kułakowski, Konrad, Mazurek, Jiri, Ernst, Sebastian

arXiv.org Artificial Intelligence

Abstract Like electoral systems, decision-making methods are also vulnerable to manipulation by decision-makers. The ability to effectively defend against such threats can only come from thoroughly understanding the manipulation mechanisms. In the presented article, we show two algorithms that can be used to launch a manipulation attack. They allow for equating the weights of two selected alternatives in the pairwise comparison method and, consequently, choosing a leader. The theoretical considerations are accompanied by a Monte Carlo simulation showing the relationship between the size of the PC matrix, the degree of inconsistency, and the ease of manipulation. This work is a continuation of our previous research published in the paper (Szybowski et al., 2023)


A Case for Dataset Specific Profiling

Ockerman, Seth, Wu, John, Stewart, Christopher

arXiv.org Artificial Intelligence

Data-driven science is an emerging paradigm where scientific discoveries depend on the execution of computational AI models against rich, discipline-specific datasets. With modern machine learning frameworks, anyone can develop and execute computational models that reveal concepts hidden in the data that could enable scientific applications. For important and widely used datasets, computing the performance of every computational model that can run against a dataset is cost prohibitive in terms of cloud resources. Benchmarking approaches used in practice use representative datasets to infer performance without actually executing models. While practicable, these approaches limit extensive dataset profiling to a few datasets and introduce bias that favors models suited for representative datasets. As a result, each dataset's unique characteristics are left unexplored and subpar models are selected based on inference from generalized datasets. This necessitates a new paradigm that introduces dataset profiling into the model selection process. To demonstrate the need for dataset-specific profiling, we answer two questions:(1) Can scientific datasets significantly permute the rank order of computational models compared to widely used representative datasets? (2) If so, could lightweight model execution improve benchmarking accuracy? Taken together, the answers to these questions lay the foundation for a new dataset-aware benchmarking paradigm.


Random Surfing Revisited: Generalizing PageRank's Teleportation Model

Nikolakopoulos, Athanasios N.

arXiv.org Machine Learning

We revisit the Random Surfer model, focusing on its--often overlooked--Teleportation component, and we introduce NCDawareRank; a novel ranking framework designed to exploit network meta-information as well as aspects of its higher-order structural organization in a way that preserves the mathematical structure and the attractive computational characteristics of PageRank. A rigorous theoretical exploration of the proposed model reveals a wealth of mathematical properties that entail tangible benefits in terms of robustness, computability, as well as modeling flexibility and expressiveness. A set of experiments on real-work networks verify the theoretically predicted properties of NCDawareRank, and showcase its effectiveness as a network centrality measure.