Inductive Learning
Multiple Cause Vector Quantization
We propose a model that can learn parts-based representations of high- dimensional data. Our key assumption is that the dimensions of the data can be separated into several disjoint subsets, or factors, which take on values independently of each other. We assume each factor has a small number of discrete states, and model it using a vector quantizer. The selected states of each factor represent the multiple causes of the input. Given a set of training examples, our model learns the association of data dimensions with factors, as well as the states of each VQ.
Invariant Pattern Recognition by Semi-Definite Programming Machines
Previous approaches are either based on regularisation or on the gen- eration of virtual (transformed) examples. We develop a new frame- work for learning linear classifiers under known transformations based on semidefinite programming. We present a new learning algorithm-- the Semidefinite Programming Machine (SDPM)--which is able to find a maximum margin hyperplane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal real-valued output as virtual support vec- tors. Extensions to segments of trajectories, to more than one trans- formation parameter, and to learning with kernels are discussed.
Classification with Hybrid Generative/Discriminative Models
Although discriminatively trained classifiers are usually more accurate when labeled training data is abundant, previous work has shown that when training data is limited, generative classifiers can out-perform them. This paper describes a hybrid model in which a high-dimensional subset of the parameters are trained to maximize generative likelihood, and another, small, subset of parameters are discriminatively trained to maximize conditional likelihood. We give a sample complexity bound showing that in order to fit the discriminative parameters well, the num- ber of training examples required depends only on the logarithm of the number of feature occurrences and feature set size. Experimental results show that hybrid models can provide lower test error and can produce better accuracy/coverage curves than either their purely generative or purely discriminative counterparts. We also discuss several advantages of hybrid models, and advocate further work in this area.
Kernel Dimensionality Reduction for Supervised Learning
We propose a novel method of dimensionality reduction for supervised learning. Given a regression or classification problem in which we wish to predict a variable Y from an explanatory vector X, we treat the prob- lem of dimensionality reduction as that of finding a low-dimensional "ef- fective subspace" of X which retains the statistical relationship between X and Y . We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem, we characterize the notion of conditional independence using covariance operators on reproducing kernel Hilbert spaces; this allows us to derive a contrast function for estimation of the effective subspace. Un- like many conventional methods, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y .
Error Bounds for Transductive Learning via Compression and Clustering
In contrast to inductive learning, in the transductive setting the learner is given both the training and test sets prior to learning. The goal of the learner is to infer (or "transduce") the labels of the test points. The transduction setting was introduced by Vapnik [1, 2] who proposed basic bounds and an algorithm for this setting. Clearly, inferring the labels of points in the test set can be done using an inductive scheme. However, as pointed out in [2], it makes little sense to solve an easier problem by'reducing' it to a much more difficult one. In particular, the prior knowledge carried by the (unlabeled) test points can be incorporated into an algorithm, potentially leading to superior performance. Indeed, a number of papers have demonstrated empirically that transduction can offer substantial advantage over induction whenever the training set is small or moderate (see e.g.
Object Classification from a Single Example Utilizing Class Relevance Metrics
We describe a framework for learning an object classifier from a single example. This goal is achieved by emphasizing the relevant dimensions for classification using available examples of related classes. Learning to accurately classify objects from a single training example is often un- feasible due to overfitting effects. However, if the instance representa- tion provides that the distance between each two instances of the same class is smaller than the distance between any two instances from dif- ferent classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. We therefore suggest a two stage strategy.
A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
We consider the situation in semi-supervised learning, where the "label sampling" mechanism stochastically depends on the true response (as well as potentially on the features). We suggest a method of moments for estimating this stochastic dependence using the unlabeled data. This is potentially useful for two distinct purposes: a. As an input to a super- vised learning procedure which can be used to "de-bias" its results using labeled data only and b. We present several examples to illustrate the practical usefulness of our method.
Breaking SVM Complexity with Cross-Training
We propose to selectively remove examples from the training set using probabilistic estimates related to editing algorithms (Devijver and Kittler, 1982). This heuristic procedure aims at creating a separable distribution of training examples with minimal impact on the position of the decision boundary. It breaks the linear dependency between the number of SVs and the number of training examples, and sharply reduces the complexity of SVMs during both the training and prediction stages.
Blind One-microphone Speech Separation: A Spectral Learning Approach
We present an algorithm to perform blind, one-microphone speech sep- aration. Instead, we formulate the problem of speech sep- aration as a problem in segmenting the spectrogram of the signal into two or more disjoint sets. We build feature sets for our segmenter using classical cues from speech psychophysics. We then combine these fea- tures into parameterized affinity matrices. We also take advantage of the fact that we can generate training examples for segmentation by artifi- cially superposing separately-recorded signals.
Semi-supervised Learning on Directed Graphs
Given a directed graph in which some of the nodes are labeled, we inves- tigate the question of how to exploit the link structure of the graph to infer the labels of the remaining unlabeled nodes. To that extent we propose a regularization framework for functions de(cid:2)ned over nodes of a directed graph that forces the classi(cid:2)cation function to change slowly on densely linked subgraphs. A powerful, yet computationally simple classi(cid:2)cation algorithm is derived within the proposed framework. The experimental evaluation on real-world Web classi(cid:2)cation problems demonstrates en- couraging results that validate our approach.