Cherkassky, Vladimir
Multiclass Learning from Contradictions
Dhar, Sauptik, Cherkassky, Vladimir, Shah, Mohak
We introduce the notion of learning from contradictions, a.k.a We show that learning from contradictions (using MU-SVM) incurs lower sample complexity compared to multiclass SVM (M-SVM) by deriving the Natarajan dimension for sample complexity for PAC-learnability of MU-SVM. We also propose an analytic span bound for MU-SVM and demonstrate its utility for model selection resulting in $\sim 2-4 \times$ faster computation times than standard resampling techniques. We empirically demonstrate the efficacy of MU- SVM on several real world datasets achieving $ $ 20\% improvement in test accuracies compared to M-SVM. Insights into the underlying behavior of MU-SVM using a histograms-of-projections method are also provided.
Single Class Universum-SVM
Dhar, Sauptik, Cherkassky, Vladimir
This paper extends the idea of Universum learning [1, 2] to single - cla ss learning problems. We propose Single Class Universum - SVM setting that incorporates a priori knowledge (in the form of additional data samples) into the single class estimation problem . Th ese additional data samples or U niversum belong to the same applic ati on domain as (positive) data samples from a single class (of interest), but they follow a different distr ibution . Proposed methodology for single class U - SVM is based on the known connection between binary classification and single class learning formul ations [3]. S everal empirical comparisons are presented to illustrate the utility of the proposed approach.
Multiclass Universum SVM
Dhar, Sauptik, Cherkassky, Vladimir, Shah, Mohak
We introduce Universum learning for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose an analytic span bound for model selection with almost 2-4x faster computation times than standard resampling techniques. We empirically demonstrate the efficacy of the proposed MUSVM formulation on several real world datasets achieving > 20% improvement in test accuracies compared to multi-class SVM.
Efficient Kernel Discriminant Analysis via QR Decomposition
Xiong, Tao, Ye, Jieping, Li, Qi, Janardan, Ravi, Cherkassky, Vladimir
Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs.
Efficient Kernel Discriminant Analysis via QR Decomposition
Xiong, Tao, Ye, Jieping, Li, Qi, Janardan, Ravi, Cherkassky, Vladimir
Adaptive knot Placement for Nonparametric Regression
Najafi, Hossein L., Cherkassky, Vladimir
We show how an "Elman" network architecture, constructed from recurrently connected oscillatory associative memory network modules, can employ selective "attentional" control of synchronization to direct the flow of communication and computation within the architecture to solve a grammatical inference problem. Previously we have shown how the discrete time "Elman" network algorithm can be implemented in a network completely described by continuous ordinary differential equations. The time steps (machine cycles) of the system are implemented by rhythmic variation (clocking) of a bifurcation parameter. In this architecture, oscillation amplitude codes the information content or activity of a module (unit), whereas phase and frequency are used to "softwire" the network. Only synchronized modules communicate by exchanging amplitude information; the activity of non-resonating modules contributes incoherent crosstalk noise. Attentional control is modeled as a special subset of the hidden modules with ouputs which affect the resonant frequencies of other hidden modules. They control synchrony among the other modules and direct the flow of computation (attention) to effect transitions between two subgraphs of a thirteen state automaton which the system emulates to generate a Reber grammar. The internal crosstalk noise is used to drive the required random transitions of the automaton.
Adaptive knot Placement for Nonparametric Regression
Najafi, Hossein L., Cherkassky, Vladimir
We show how an "Elman" network architecture, constructed from recurrently connected oscillatory associative memory network modules, canemploy selective "attentional" control of synchronization to direct the flow of communication and computation within the architecture to solve a grammatical inference problem. Previously we have shown how the discrete time "Elman" network algorithm can be implemented in a network completely described by continuous ordinary differential equations. The time steps (machine cycles)of the system are implemented by rhythmic variation (clocking) of a bifurcation parameter. In this architecture, oscillation amplitudecodes the information content or activity of a module (unit), whereas phase and frequency are used to "softwire" the network. Only synchronized modules communicate by exchanging amplitudeinformation; the activity of non-resonating modules contributes incoherent crosstalk noise. Attentional control is modeled as a special subset of the hidden modules with ouputs which affect the resonant frequencies of other hidden modules. They control synchrony among the other modules anddirect the flow of computation (attention) to effect transitions betweentwo subgraphs of a thirteen state automaton which the system emulates to generate a Reber grammar. The internal crosstalk noise is used to drive the required random transitions of the automaton.