Goto

Collaborating Authors

 Support Vector Machines


Data-dependent compression of random features for large-scale kernel approximation

arXiv.org Machine Learning

Kernel methods offer the flexibility to learn complex relationships in modern, large data sets while enjoying strong theoretical guarantees on quality. Unfortunately, these methods typically require cubic running time in the data set size, a prohibitive cost in the large-data setting. Random feature maps (RFMs) and the Nystrom method both consider low-rank approximations to the kernel matrix as a potential solution. But, in order to achieve desirable theoretical guarantees, the former may require a prohibitively large number of features J+, and the latter may be prohibitively expensive for high-dimensional problems. We propose to combine the simplicity and generality of RFMs with a data-dependent feature selection scheme to achieve desirable theoretical approximation properties of Nystrom with just O(log J+) features. Our key insight is to begin with a large set of random features, then reduce them to a small number of weighted features in a data-dependent, computationally efficient way, while preserving the statistical guarantees of using the original large set of features. We demonstrate the efficacy of our method with theory and experiments--including on a data set with over 50 million observations. In particular, we show that our method achieves small kernel matrix approximation error and better test set accuracy with provably fewer random features than state- of-the-art methods.


Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels

arXiv.org Machine Learning

Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.


OpenCV Face Recognition - PyImageSearch

#artificialintelligence

In this tutorial, you will learn how to use OpenCV to perform face recognition. To build our face recognition system, we'll first perform face detection, extract face embeddings from each face using deep learning, train a face recognition model on the embeddings, and then finally recognize faces in both images and video streams with OpenCV. Today's tutorial is also a special gift for my fiance e, Trisha (who is now officially my wife). Our wedding was over the weekend, and by the time you're reading this blog post, we'll be at the airport preparing to board our flight for the honeymoon. To celebrate the occasion, and show her how much her support of myself, the PyImageSearch blog, and the PyImageSearch community means to me, I decided to use OpenCV to perform face recognition on a dataset of our faces. You can swap in your own dataset of faces of course! All you need to do is follow my directory structure in insert your own face images. As a bonus, I've also included how to label "unknown" faces that cannot be classified with sufficient confidence. To learn how to perform OpenCV face recognition, just keep reading!


IMMIGRATE: A Margin-based Feature Selection Method with Interaction Terms

arXiv.org Machine Learning

By balancing margin-quantity maximization and margin-quality maximization, the proposed IMMIGRATE algorithm considers both local and global information when using margin-based frameworks. We here derive a new mathematical interpretation of margin-based cost function by using the quadratic form distance (QFD) and applying both the large-margin and max-min entropy principles. We also design a new principle for classifying new samples and propose a Bayesian framework to iteratively minimize the cost function. We demonstrate the power of our new method by comparing it with 16 widely used classifiers (e.g. Support Vector Machine, k-nearest neighbors, RELIEF, etc.) including some classifiers that are capable of identifying interaction terms (e.g. SODA, hierNet, etc.) on synthetic dataset, five gene expression datasets, and twenty UCI machine learning datasets. Our method is able to outperform other methods in most cases.


WiPIN: Operation-free Person Identification using WiFi Signals

arXiv.org Machine Learning

Person identification is critical for sensitive applications such as system login/unlock, access control and payment. In this paper, we present an operation-free person identification system, namely WiPIN, that identifies biometric features of users using Wi-Fi signals. Our approach is based on an entirely new insight that different persons have distinct effects, including the absorption and reflection, on the Wi-Fi signals. We show that through effective signal processing and feature extraction/matching designs, the Channel State Information (CSI) used in recent Wi-Fi protocols can be utilized for person identification without requiring any collaborative operations, such as wiping, walking, or speaking. We theoretically analyzed the interaction between the human body and Wi-Fi Signals via an interactive model. We proposed a mapping rule between variation patterns of Wi-Fi signals and human biologic features, and demonstrated the feasibility of establishing CSI based person identifiers. We conducted extensive experiments over commodity off-the-shelf Wi-Fi devices. The results show WiPIN achieves 92% accuracy in person identification over a group of 30 users, with sufficient robustness to environment noises.


Neural Regression Trees

arXiv.org Machine Learning

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one. Current approaches for RvC use ad-hoc discretization strategies and are suboptimal. We propose a neural regression tree model for RvC. In this model, we employ a joint optimization framework where we learn optimal discretization thresholds while simultaneously optimizing the features for each node in the tree. We empirically show the validity of our model by testing it on two challenging regression tasks where we establish the state of the art.


Comprehensive Support Vector Machines Guide - Using Illusion to Solve Reality!

#artificialintelligence

Unraveling The Dream Within The Dream! Very few would need a hint to guess that the picture on the left is taken from the movie, Inception. The behavior of the spinning top helps in differentiating reality from illusion. It's a mesmerizing concept attempting to visually articulate the subconscious mind. Inception is a movie based on lucid dreaming. The science fiction shows how something that cannot be achieved in real world, can be achieved by transforming the world to a virtual reality and then after the goal is achieved, transform the world back to reality.


Learning Confidence Sets using Support Vector Machines

arXiv.org Machine Learning

The goal of confidence-set learning in the binary classification setting is to construct two sets, each with a specific probability guarantee to cover a class. An observation outside the overlap of the two sets is deemed to be from one of the two classes, while the overlap is an ambiguity region which could belong to either class. Instead of plug-in approaches, we propose a support vector classifier to construct confidence sets in a flexible manner. Theoretically, we show that the proposed learner can control the non-coverage rates and minimize the ambiguity with high probability. Efficient algorithms are developed and numerical studies illustrate the effectiveness of the proposed method.


A novel active learning framework for classification: using weighted rank aggregation to achieve multiple query criteria

arXiv.org Artificial Intelligence

Multiple query criteria active learning (MQCAL) methods have a higher potential performance than conventional active learning methods in which only one criterion is deployed for sample selection. A central issue related to MQCAL methods concerns the development of an integration criteria strategy (ICS) that makes full use of all criteria. The conventional ICS adopted in relevant research all facilitate the desired effects, but several limitations still must be addressed. For instance, some of the strategies are not sufficiently scalable during the design process, and the number and type of criteria involved are dictated. Thus, it is challenging for the user to integrate other criteria into the original process unless modifications are made to the algorithm. Other strategies are too dependent on empirical parameters, which can only be acquired by experience or cross-validation and thus lack generality; additionally, these strategies are counter to the intention of active learning, as samples need to be labeled in the validation set before the active learning process can begin. To address these limitations, we propose a novel MQCAL method for classification tasks that employs a third strategy via weighted rank aggregation. The proposed method serves as a heuristic means to select high-value samples of high scalability and generality and is implemented through a three-step process: (1) the transformation of the sample selection to sample ranking and scoring, (2) the computation of the self-adaptive weights of each criterion, and (3) the weighted aggregation of each sample rank list. Ultimately, the sample at the top of the aggregated ranking list is the most comprehensively valuable and must be labeled. Several experiments generating 257 wins, 194 ties and 49 losses against other state-of-the-art MQCALs are conducted to verify that the proposed method can achieve superior results.


Generalization Properties of hyper-RKHS and its Application to Out-of-Sample Extensions

arXiv.org Machine Learning

Hyper-kernels endowed by hyper-Reproducing Kernel Hilbert Space (hyper-RKHS) formulate the kernel learning task as learning on the space of kernels itself, which provides significant model flexibility for kernel learning with outstanding performance in real-world applications. However, the convergence behavior of these learning algorithms in hyper-RKHS has not been investigated in learning theory. In this paper, we conduct approximation analysis of kernel ridge regression (KRR) and support vector regression (SVR) in this space. To the best of our knowledge, this is the first work to study the approximation performance of regression in hyper-RKHS. For applications, we propose a general kernel learning framework conducted by the introduced two regression models to deal with the out-of-sample extensions problem, i.e., to learn a underlying general kernel from the pre-given kernel/similarity matrix in hyper-RKHS. Experimental results on several benchmark datasets suggest that our methods are able to learn a general kernel function from an arbitrary given kernel matrix.