Goto

Collaborating Authors

 Learning in High Dimensional Spaces


Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm

Neural Information Processing Systems

We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invariant subspace that corresponds to the solution of LTSA. We derive a worst-case upper bound of errors for LTSA which naturally leads to a convergence result. We then derive the rate of convergence for LTSA in a special case.


Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm

Neural Information Processing Systems

We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invariant subspace that corresponds to the solution of LTSA. We derive a worst-case upper bound of errors for LTSA which naturally leads to a convergence result. We then derive the rate of convergence for LTSA in a special case.


Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm

Neural Information Processing Systems

We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invariant subspace that corresponds to the solution of LTSA. We derive a worst-case upper bound of errors for LTSA which naturally leads to a convergence result. We then derive the rate of convergence for LTSA in a special case.


Nonlinear Estimators and Tail Bounds for Dimension Reduction in $l_1$ Using Cauchy Random Projections

arXiv.org Artificial Intelligence

For dimension reduction in $l_1$, the method of {\em Cauchy random projections} multiplies the original data matrix $\mathbf{A} \in\mathbb{R}^{n\times D}$ with a random matrix $\mathbf{R} \in \mathbb{R}^{D\times k}$ ($k\ll\min(n,D)$) whose entries are i.i.d. samples of the standard Cauchy C(0,1). Because of the impossibility results, one can not hope to recover the pairwise $l_1$ distances in $\mathbf{A}$ from $\mathbf{B} = \mathbf{AR} \in \mathbb{R}^{n\times k}$, using linear estimators without incurring large errors. However, nonlinear estimators are still useful for certain applications in data stream computation, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. The sample median estimator and the geometric mean estimator are asymptotically (as $k\to \infty$) equivalent but the latter is more accurate at small $k$. We derive explicit tail bounds for the geometric mean estimator and establish an analog of the Johnson-Lindenstrauss (JL) lemma for dimension reduction in $l_1$, which is weaker than the classical JL lemma for dimension reduction in $l_2$. Asymptotically, both the sample median estimator and the geometric mean estimators are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.


Knowledge Driven Dimension Reduction For Clustering

AAAI Conferences

However, most dimension reduction approaches are driven by objective functions that may not or only We will provide more detail on our solution to this problem partially suit the end users requirements. In this later but it is important to note the problem of focus in this work, we show how to incorporate general-purpose paper is different to spectral clustering (dimension reduction) domain expertise encoded as a graph into dimension in two keys ways. Firstly, we are projecting the entire space reduction in way that lends itself to an elegant D occupies not just the points in G or D. Secondly, we do generalized eigenvalue problem. We call not formulate the problem as some form of min-cut and then our approach Graph-Driven Constrained Dimension solve a relaxed version of the problem. Reduction via Linear Projection (GCDR-LP) Our work aims to find a reduced dimension space based on and show that it has several desirable properties.


Dimension reduction in representation of the data

arXiv.org Machine Learning

Suppose the data consist of a set $S$ of points $x_j$, $1\leq j \leq J$, distributed in a bounded domain $D\subset R^N$, where $N$ is a large number. An algorithm is given for finding the sets $L_k$ of dimension $k\ll N$, $k=1,2,...K$, in a neighborhood of which maximal amount of points $x_j\in S$ lie. The algorithm is different from PCA (principal component analysis)





Local Procrustes for Manifold Embedding: A Measure of Embedding Quality and Embedding Algorithms

arXiv.org Machine Learning

Machine Learning manuscript No. (will be inserted by the editor) Abstract We present the Procrustes measure, a novel measure based on Procrustes rotation that enables quantitative comparison of the output of manifold-based embedding algorithms (such as LLE (Roweis and Saul, 2000) and Isomap (Tenenbaum et al, 2000)). The measure also serves as a natural tool when choosing dimension-reduction parameters. We also present two novel dimension-reduction techniques that attempt to minimize the suggested measure, and compare the results of these techniques to the results of existing algorithms. Finally, we suggest a simple iterative method that can be used to improve the output of existing algorithms. Keywords Dimension reducing · Manifold learning · Procrustes analysis, · Local PCA · Simulated annealing 1 Introduction Technological advances constantly improve our ability to collect and store large sets of data. The main difficulty in analyzing such high-dimensional data sets is, that the number of observations required to estimate functions at a set level of accuracy grows exponentially with the dimension. This problem, often referred to as the curse of dimensionality, has led to various techniques that attempt to reduce the dimension of the original data. Historically, the main approach to dimension reduction is the linear one. This is the approach used by principle component analysis (PCA) and factor analysis (see Mardia et al, 1979, for both).