Plotting

 Liu, Wei


Generalized Canonical Correlation Analysis and Its Application to Blind Source Separation Based on a Dual-Linear Predictor Structure

arXiv.org Machine Learning

Blind source separation (BSS) is one of the most important and established research topics in signal processing and many algorithms have been proposed based on different statistical properties of the source signals. For second-order statistics (SOS) based methods, canonical correlation analysis (CCA) has been proved to be an effective solution to the problem. In this work, the CCA approach is generalized to accommodate the case with added white noise and it is then applied to the BSS problem for noisy mixtures. In this approach, the noise component is assumed to be spatially and temporally white, but the variance information of noise is not required. An adaptive blind source extraction algorithm is derived based on this idea and a further extension is proposed by employing a dual-linear predictor structure for blind source extraction (BSE).


Compact Hyperplane Hashing with Bilinear Functions

arXiv.org Machine Learning

Hyperplane hashing aims at rapidly searching nearest points to a hyperplane, and has shown practical impact in scaling up active learning with SVMs. Unfortunately, the existing randomized methods need long hash codes to achieve reasonable search accuracy and thus suffer from reduced search speed and large memory overhead. To this end, this paper proposes a novel hyperplane hashing technique which yields compact hash codes. The key idea is the bilinear form of the proposed hash functions, which leads to higher collision probability than the existing hyperplane hash functions when using random projections. To further increase the performance, we propose a learning based framework in which the bilinear functions are directly learned from the data. This results in short yet discriminative codes, and also boosts the search performance over the random projection based solutions. Large-scale active learning experiments carried out on two datasets with up to one million samples demonstrate the overall superiority of the proposed approach.


Constrained Metric Learning Via Distance Gap Maximization

AAAI Conferences

Vectored data frequently occur in a variety of fields, which are easy to handle since they can be mathematically abstracted as points residing in a Euclidean space. An appropriate distance metric in the data space is quite demanding for a great number of applications. In this paper, we pose robust and tractable metric learning under pairwise constraints that are expressed as similarity judgements between data pairs. The major features of our approach include: 1) it maximizes the gap between the average squared distance among dissimilar pairs and the average squared distance among similar pairs; 2) it is capable of propagating similar constraints to all data pairs; and 3) it is easy to implement in contrast to the existing approaches using expensive optimization such as semidefinite programming. Our constrained metric learning approach has widespread applicability without being limited to particular backgrounds. Quantitative experiments are performed for classification and retrieval tasks, uncovering the effectiveness of the proposed approach.


Determining the Unithood of Word Sequences using Mutual Information and Independence Measure

arXiv.org Artificial Intelligence

Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring unithood is extremely small. We propose a new approach, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our evaluations revealed a precision and recall of 98.68% and 91.82% respectively with an accuracy at 95.42% in measuring the unithood of 1005 test cases.


Determining the Unithood of Word Sequences using a Probabilistic Approach

arXiv.org Artificial Intelligence

Most research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically motivated and derived. We propose a new probabilistically-derived measure, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our comparative study using 1,825 test cases against an existing empirically-derived function revealed an improvement in terms of precision, recall and accuracy.