Optimal Weighting of Multi-View Data with Low Dimensional Hidden States
In areas like Natural Language Processing, data often have multi-view and high dimension. Recently, CCA [8] has been applied to the multi-view setting as a unsupervised dimension reduction method in [7][10][3] with performance guarantee if the data is generated under certain structure. In [7], they assume the high dimensional multi-view data is generated independently conditioning on a low dimensional hidden state (the model structure will be illustrated later in detail). Under this assumption, the low dimensional features provided by CCA won't lose any useful information compared with the original high dimensional features when applied to linear regression. Also, [6] has applied this CCA method to generate a low dimensional vector representation of words which works well in a lot of NLP tasks. The reason for CCA to work well is that the low dimensional hidden state (throughout the paper we'll use k to denote the dimension of hidden state) 1 contains most information for the supervised tasks and by doing CCA, we are able to generate k dimensional estimate of the hidden state from each view as mentioned by [4], or more precisely, we can find all k directions in the high dimensional space of each view that have nonzero correlation with the hidden state via CCA. Only two views are enough to implement the CCA algorithms above (see [7] for detailed introduction about CCA). Despite it's power in dimension reduction, CCA with two views is still not optimal in the sense that it ends up with a hidden state estimator from each view but it's impossible to tell which view is better by only looking at the two views.
Sep-26-2012