Kernel methods have been applied successfully in many applications. The kernel matrix plays an important role in kernel-based learning methods, but the ideal kernel matrix is usually unknown in practice and needs to be estimated. In this paper, we propose to directly learn the ideal kernel matrix (called the optimal neighborhood kernel matrix) from a pre-specified kernel matrix for improved classification performance. We assume that the pre-specified kernel matrix generated from the specific application is a noisy observation of the ideal one. The resulting optimal neighborhood kernel matrix is shown to be the summation of the pre-specified kernel matrix and a rank-one matrix. We formulate the problem of learning the optimal neighborhood kernel as a constrained quartic problem, and propose to solve it using two methods: level method and constrained gradient descent. Empirical results on several benchmark data sets demonstrate the efficiency and effectiveness of the proposed algorithms.
Multiresolution Matrix Factorization (MMF) was recently introduced as an alternative to the dominant low-rank paradigm in order to capture structure in matrices at multiple different scales. Using ideas from multiresolution analysis (MRA), MMF teased out hierarchical structure in symmetric matrices by constructing a sequence of wavelet bases. While effective for such matrices, there is plenty of data that is more naturally represented as nonsymmetric matrices (e.g. directed graphs), but nevertheless has similar hierarchical structure. In this paper, we explore techniques for extending MMF to any square matrix. We validate our approach on numerous matrix compression tasks, demonstrating its efficacy compared to low-rank methods. Moreover, we also show that a combined low-rank and MMF approach, which amounts to removing a small global-scale component of the matrix and then extracting hierarchical structure from the residual, is even more effective than each of the two complementary methods for matrix compression.
This is the second in a series of blog posts sharing my experiences working with algorithms and data structures for machine learning. These experiences were gained whilst building out the nlp project for LSA (Latent Semantic Analysis) of text documents. In Part 1 of this series, I explored alternative approaches for representing and applying TF-IDF transforms for weighting term frequencies across document corpora. We tested the approaches using Go's inbuilt benchmark functionality and found that our optimisations materially improved not just memory consumption but also performance (reducing memory consumption and processing time from 7 GB and 41 seconds to 250 KB and 0.8 seconds respectively). In this blog post I shall explore other areas for optimisation, seeking to further reduce memory consumption and processing time.