seriation
Supplementary Materials: Semi-Supervised Contrastive Learning for Deep Regression with Ordinal Rankings from Spectral Seriation
The main result is presented in Theorem 2. According to the definition of the Fiedler vector, we have ( L + L)( f + f) = ( λ + λ)( f + f). We outline the proof below for interested readers. The main result is presented in Theorem 2. We first present Stewart's theorem in Lemma 1 to assist Actual times may differ depending on hardware and environment. We also show the number of model parameters required for each method in Table S3. Hyper-parameters were selected based on a coarse search on the validation set.
Exact Matrix Seriation through Mathematical Optimization: Stress and Effectiveness-Based Models
Blanco, Víctor, Marín, Alfredo, Puerto, Justo
Matrix seriation, the problem of permuting the rows and columns of a matrix to uncover latent structure, is a fundamental technique in data science, particularly in the visualization and analysis of relational data. Applications span clustering, anomaly detection, and beyond. In this work, we present a unified framework grounded in mathematical optimization to address matrix seriation from a rigorous, model-based perspective. Our approach leverages combinatorial and mixed-integer optimization to represent seriation objectives and constraints with high fidelity, bridging the gap between traditional heuristic methods and exact solution techniques. We introduce new mathematical programming models for neighborhood-based stress criteria, including nonlinear formulations and their linearized counterparts. For structured settings such as Moore and von Neumann neighborhoods, we develop a novel Hamiltonian path-based reformulation that enables effective control over spatial arrangement and interpretability in the reordered matrix. To assess the practical impact of our models, we carry out an extensive set of experiments on synthetic and real-world datasets, as well as on a newly curated benchmark based on a coauthorship network from the matrix seriation literature. Our results show that these optimization-based formulations not only enhance solution quality and interpretability but also provide a versatile foundation for extending matrix seriation to new domains in data science.
- Europe > Ireland (0.14)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
Matrix Reordering for Noisy Disordered Matrices: Optimality and Computationally Efficient Algorithms
Motivated by applications in single-cell biology and metagenomics, we consider matrix reordering based on the noisy disordered matrix model. We first establish the fundamental statistical limit for the matrix reordering problem in a decision-theoretic framework and show that a constrained least square estimator is rate-optimal. Given the computational hardness of the optimal procedure, we analyze a popular polynomial-time algorithm, spectral seriation, and show that it is suboptimal. We then propose a novel polynomial-time adaptive sorting algorithm with guaranteed improvement on the performance. The superiority of the adaptive sorting algorithm over the existing methods is demonstrated in simulation studies and in the analysis of two real single-cell RNA sequencing datasets.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Make Patterns Pop Out of Heatmaps with Seriation
One of the easiest ways to start visualizing data is to turn a table into a heatmap: every cell gets a colour, the higher the number the brighter the colour. Unfortunately, this is often a fairly unrewarding exercise, yielding graphics that look like plaid or tartan fabric. Part of the problem is that the rows and columns of a dataset often have no natural ordering, such as time, and are instead shown in alphabetical order, or else the dataset is sorted by one of the rows or columns, rather than in an order which makes patterns pop out visually. My goal in this article is to clearly demonstrate this problem and show that there exist neat solutions to this problem using a set of techniques collectively called seriation. I'll do this by automatically reordering the rows and columns in the following noisy-looking heatmap to make the underlying pattern very clear.
Massive Data Clustering in Moderate Dimensions from the Dual Spaces of Observation and Attribute Data Clouds
Cluster analysis of very high dimensional data can benefit from the properties of such high dimensionality. Informally expressed, in this work, our focus is on the analogous situation when the dimensionality is moderate to small, relative to a massively sized set of observations. Mathematically expressed, these are the dual spaces of observations and attributes. The point cloud of observations is in attribute space, and the point cloud of attributes is in observation space. In this paper, we begin by summarizing various perspectives related to methodologies that are used in multivariate analytics. We draw on these to establish an efficient clustering processing pipeline, both partitioning and hierarchical clustering.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New York (0.04)