Dimensionality Reduction
Direct Density-Ratio Estimation with Dimensionality Reduction via Hetero-Distributional Subspace Analysis
Yamada, Makoto (Tokyo Institute of Technology) | Sugiyama, Masashi (Tokyo Institute of Technology)
Methods for estimating the ratio of two probability density functions have been actively explored recently since they can be used for various data processing tasks such as non-stationarity adaptation, outlier detection, feature selection, and conditional probability estimation. In this paper, we propose a new density-ratio estimator which incorporates dimensionality reduction into the density-ratio estimation procedure. Through experiments, the proposed method is shown to compare favorably with existing density-ratio estimators in terms of both accuracy and computational costs.
A convex model for non-negative matrix factorization and dimensionality reduction on physical space
Esser, Ernie, Möller, Michael, Osher, Stanley, Sapiro, Guillermo, Xin, Jack
A collaborative convex framework for factoring a data matrix $X$ into a non-negative product $AS$, with a sparse coefficient matrix $S$, is proposed. We restrict the columns of the dictionary matrix $A$ to coincide with certain columns of the data matrix $X$, thereby guaranteeing a physically meaningful dictionary and dimensionality reduction. We use $l_{1,\infty}$ regularization to select the dictionary from the data and show this leads to an exact convex relaxation of $l_0$ in the case of distinct noise free data. We also show how to relax the restriction-to-$X$ constraint by initializing an alternating minimization approach with the solution of the convex model, obtaining a dictionary close to but not necessarily in $X$. We focus on applications of the proposed framework to hyperspectral endmember and abundances identification and also show an application to blind source separation of NMR data.
CrossBridge: Finding Analogies Using Dimensionality Reduction
Krishnamurthy, Jayant (Carnegie Mellon University) | Lieberman, Henry (MIT Media Laboratory)
We present CrossBridge, a practical algorithm for retrieving analogies in large, sparse semantic networks. Other algorithms adopt a generate-and-test approach, retrieving candidate analogies by superficial similarity of concepts, then testing them for the particular relations involved in the analogy. CrossBridge adopts a global approach. It organizes the entire knowledge space at once, as a matrix of small concept-and-relation subgraph patterns versus actual occurrences of subgraphs from the knowledge base. It uses the familiar mathematics of dimensionality reduction to reorganize this space along dimensions representing approximate semantic similarity of these subgraphs. Analogies can then be retrieved by simple nearest-neighbor comparison. CrossBridge also takes into account not only knowledge directly related to the source and target domains, but also a large background Commonsense knowledge base. Commonsense influences the mapping between domains, preserving important relations while ignoring others. This property allows CrossBridge to find more intuitive and extensible analogies. We compare our approach with an implementation of structure mapping and show that our algorithm consistently finds analogies in cases where structure mapping fails. We also present some discovered analogies.
Multi-Instance Dimensionality Reduction
Sun, Yu-Yin (Nanjing University) | Ng, Michael K. (ong Kong Baptist University) | Zhou, Zhi-Hua (Nanjing University)
Multi-instance learning deals with problems that treat bags of instances as training examples. In single-instance learning problems, dimensionality reduction is an essential step for high-dimensional data analysis and has been studied for years. The curse of dimensionality also exists in multiinstance learning tasks, yet this difficult task has not been studied before. Direct application of existing single-instance dimensionality reduction objectives to multi-instance learning tasks may not work well since it ignores the characteristic of multi-instance learning that the labels of bags are known while the labels of instances are unknown. In this paper, we propose an effective model and develop an efficient algorithm to solve the multi-instance dimensionality reduction problem. We formulate the objective as an optimization problem by considering orthonormality and sparsity constraints in the projection matrix for dimensionality reduction, and then solve it by the gradient descent along the tangent space of the orthonormal matrices. We also propose an approximation for improving the efficiency. Experimental results validate the effectiveness of the proposed method.
Non-I.I.D. Multi-Instance Dimensionality Reduction by Learning a Maximum Bag Margin Subspace
Ping, Wei (Tsinghua University) | Xu, Ye (Nanjing University) | Ren, Kexin (Nanjing University of Aeronautics and Astronautics) | Chi, Chi-Hung (Tsinghua University) | Shen, Furao (Nanjing University)
Multi-instance learning, as other machine learning tasks, also suffers from the curse of dimensionality. Although dimensionality reduction methods have been investigated for many years, multi-instance dimensionality reduction methods remain untouched. On the other hand, most algorithms in multi- instance framework treat instances in each bag as independently and identically distributed samples, which fails to utilize the structure information conveyed by instances in a bag. In this paper, we propose a multi-instance dimensionality reduction method, which treats instances in each bag as non-i.i.d. samples. We regard every bag as a whole entity and define a bag margin objective function. By maximizing the margin of positive and negative bags, we learn a subspace to obtain more salient representation of original data. Experiments demonstrate the effectiveness of the proposed method.
Envisioning a Robust, Scalable Metacognitive Architecture Built on Dimensionality Reduction
Alonso, Jason Bernardino (Massachusetts Institute of Technology) | Arnold, Kenneth C. (Massachusetts Institute of Technology) | Havasi, Catherine (Massachusetts Institute of Technology)
One major challenge of implementing a metacognitive architecture lies in its scalability and flexibility. We postulate that the difference between a reasoner and a metareasoner need not extend beyond what inputs they take, and we envision a network made of many instances of a few types of simple but powerful reasoning units to serve both roles. In this paper, we present a vision and motivation for such a framework with reusable, robust, and scalable components. This framework, called Scruffy Metacognition , is built on a symbolic representation that lends itself to processing using dimensionality reduction and principal component analysis. We discuss the components of such as system and how they work together for metacognitive reasoning. Additionally, we discuss evaluative tasks for our system focusing on social agent role-playing and object classification.
Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction
Kim, Kwang I., Steinke, Florian, Hein, Matthias
Semi-supervised regression based on the graph Laplacian suffers from the fact that the solution is biased towards a constant and the lack of extrapolating power. Outgoing from these observations we propose to use the second-order Hessian energy for semi-supervised regression which overcomes both of these problems, in particular, if the data lies on or close to a low-dimensional submanifold in the feature space, the Hessian energy prefers functions which vary ``linearly with respect to the natural parameters in the data. This property makes it also particularly suited for the task of semi-supervised dimensionality reduction where the goal is to find the natural parameters in the data based on a few labeled points. The experimental result suggest that our method is superior to semi-supervised regression using Laplacian regularization and standard supervised methods and is particularly suited for semi-supervised dimensionality reduction.
Diffeomorphic Dimensionality Reduction
Walder, Christian, Schölkopf, Bernhard
This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets.
Dimensionality Reduction for Data in Multiple Feature Representations
Lin, Yen-yu, Liu, Tyng-luh, Fuh, Chiou-shann
In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.
A Unified Semi-Supervised Dimensionality Reduction Framework for Manifold Learning
Chatpatanasiri, Ratthachat, Kijsirikul, Boonserm
We present a general framework of semi-supervised dimensionality reduction for manifold learning which naturally generalizes existing supervised and unsupervised learning frameworks which apply the spectral decomposition. Algorithms derived under our framework are able to employ both labeled and unlabeled examples and are able to handle complex problems where data form separate clusters of manifolds. Our framework offers simple views, explains relationships among existing frameworks and provides further extensions which can improve existing algorithms. Furthermore, a new semi-supervised kernelization framework called ``KPCA trick'' is proposed to handle non-linear problems.