Goto

Collaborating Authors

 Support Vector Machines


Probably Approximately Efficient Combinatorial Auctions via Machine Learning

AAAI Conferences

A well-known problem in combinatorial auctions (CAs) is that the value space grows exponentially in the number of goods, which often puts a large burden on the bidders and on the auctioneer. In this paper, we introduce a new design paradigm for CAs based on machine learning (ML). Bidders report their values (bids) to a proxy agent by answering a small number of value queries. The proxy agent then uses an ML algorithm to generalize from those bids to the whole value space, and the efficient allocation is computed based on the generalized valuations. We introduce the concept of "probably approximate efficiency (PAE)" to measure the efficiency of the new ML-based auctions, and we formally show how the generelizability of an ML algorithm relates to the efficiency loss incurred by the corresponding ML-based auction. To instantiate our paradigm, we use support vector regression (SVR) as our ML algorithm, which enables us to keep the winner determination problem of the CA tractable. Different parameters of the SVR algorithm allow us to trade off the expressiveness, economic efficiency, and computational efficiency of the CA. Finally, we demonstrate experimentally that, even with a small number of bids, our ML-based auctions are highly efficient with high probability.


A Kaggler's Guide to Model Stacking in Practice

#artificialintelligence

Stacking (also called meta ensembling) is a model ensembling technique used to combine information from multiple predictive models to generate a new model. Often times the stacked model (also called 2nd-level model) will outperform each of the individual models due its smoothing nature and ability to highlight each base model where it performs best and discredit each base model where it performs poorly. For this reason, stacking is most effective when the base models are significantly different. Here I provide a simple example and guide on how stacking is most often implemented in practice. Feel free to follow this article using the related code and datasets here in the Machine Learning Problem Bible.


Intercomparison of Machine Learning Methods for Statistical Downscaling: The Case of Daily and Extreme Precipitation

arXiv.org Machine Learning

Statistical downscaling of global climate models (GCMs) allows researchers to study local climate change effects decades into the future. A wide range of statistical models have been applied to downscaling GCMs but recent advances in machine learning have not been explored. In this paper, we compare four fundamental statistical methods, Bias Correction Spatial Disaggregation (BCSD), Ordinary Least Squares, Elastic-Net, and Support Vector Machine, with three more advanced machine learning methods, Multi-task Sparse Structure Learning (MSSL), BCSD coupled with MSSL, and Convolutional Neural Networks to downscale daily precipitation in the Northeast United States. Metrics to evaluate of each method's ability to capture daily anomalies, large scale climate shifts, and extremes are analyzed. We find that linear methods, led by BCSD, consistently outperform non-linear approaches. The direct application of state-of-the-art machine learning methods to statistical downscaling does not provide improvements over simpler, longstanding approaches.


[系列活動] Machine Learning 機器學習課程

#artificialintelligence

Our learning task: Given a training set S {(x1 , y1), (x2 , y2), . . . The simplest case: y { 1, 1} called binary classification problem If y is a real number it becomes a regression problem More general case, y can be a vector and each element is drawn from a finite set.


A Modified Construction for a Support Vector Classifier to Accommodate Class Imbalances

arXiv.org Machine Learning

Given a training set with binary classification, the Support Vector Machine identifies the hyperplane maximizing the margin between the two classes of training data. This general formulation is useful in that it can be applied without regard to variance differences between the classes. Ignoring these differences is not optimal, however, as the general SVM will give the class with lower variance an unjustifiably wide berth. This increases the chance of misclassification of the other class and results in an overall loss of predictive performance. An alternate construction is proposed in which the margins of the separating hyperplane are different for each class, each proportional to the standard deviation of its class along the direction perpendicular to the hyperplane. The construction agrees with the SVM in the case of equal class variances. This paper will then examine the impact to the dual representation of the modified constraint equations.


Sparse Algorithm for Robust LSSVM in Primal Space

arXiv.org Machine Learning

Li Chen a,b, Shuisheng Zhou a, a School of Mathematics and Statistics, Xidian University, 266 Xinglong Section, Xifeng Road, Xi'an, China b Department of Basic Science, College of Information and Business, Zhongyuan Technology University, 41 Zhongyuan Middle Road, Zhengzhou, ChinaAbstract As enjoying the closed form solution, least squares support vector machine (LSSVM) has been widely used for classification and regression problems having the comparable performance with other types of SVMs. However, LSSVM has two drawbacks: sensitive to outliers and lacking sparseness. Robust LSSVM (R-LSSVM) overcomes the first partly via nonconvex truncated loss function, but the current algorithms for R-LSSVM with the dense solution are faced with the second drawback and are inefficient for training large-scale problems. In this paper, we interpret the robustness of R-LSSVM from a re-weighted viewpoint and give a primal R-LSSVM by the representer theorem. The new model may have sparse solution if the corresponding kernel matrix has low rank. Then approximating the kernel matrix by a low-rank matrix and smoothing the loss function by entropy penalty function, we propose a convergent sparse R-LSSVM (SR-LSSVM) algorithm to achieve the sparse solution of primal R-LSSVM, which overcomes two drawbacks of LSSVM simultaneously. The proposed algorithm has lower complexity than the existing algorithms and is very efficient for training large-scale problems. Many experimental results illustrate that SR-LSSVM can achieve better or comparable performance with less training time than related algorithms, especially for training large scale problems. Keywords: Primal LSSVM, Sparse solution, Re-weighted LSSVM, Low-rank approximation, Outliers 2010 MSC: 00-01, 99-00 1. Introduction Least squares support vector machine (LSSVM) was introduced by Suykens[1] and has been a powerful learning technique for classification and regression. It has been successfully used in many real world pattern recognition problems, such as disease diagnosis[2], fault detection[3], image classification [4], partial differential equations solving[5] and visual tracking[6]. LSSVM tries to minimize least squares errors on the training samples.


Machine Perception Laboratory

AITopics Original Links

The output of the face detector is scaled to 90x90 and fed directly to the facial expression analysis system (see Figure 1). The system is essentially the same as the one used for Automatic FACS coding. First the face image is passed through a bank of Gabor filters at 8 orientations and 9 scales (2-32 pixels/cycle at 0.5 octave steps). The filterbank representations are then channeled to a classifier to code the image in terms of a set of expression dimensions. We have found support vector machines to be very effective for classifying facial expressions (Littlewort et al., in press, Bartlett et al., 2003).


Distributed Weighted Parameter Averaging for SVM Training on Big Data

AAAI Conferences

Two popular approaches for distributed training of SVMs on big data are parameter averaging and alternating direction method of multipliers (ADMM). Parameter averaging is efficient but suffers from loss of accuracy with increase in number of partitions, while ADMM in the feature space is accurate but suffers from slow convergence. In this paper, we report a hybrid approach called weighted parameter averaging (WPA), which optimizes the regularized hinge loss with respect to weights on parameters. The problem is shown to be same as solving SVM in a projected space. We also demonstrate an O(1/N) stability bound on final hypothesis given by WPA, using novel proof techniques. Experimental results on a variety of toy and real world datasets show that our approach is significantly more accurate than parameter averaging for high number of partitions. It is also seen the proposed method enjoys much faster convergence compared to ADMM in feature space.


An Empirical Analysis of Constrained Support Vector Quantile Regression for Nonparametric Probabilistic Forecasting of Wind Power

AAAI Conferences

Uncertainty analysis in the form of probabilistic forecasting can provide significant improvements in decision making processes in the smart power gird for better integrating renewable energies such as wind. Whereas point forecasting provides a single expected value, probabilistic forecasts provide more information in the form of quantiles, prediction intervals, or full predictive densities. This paper analyzes the effectiveness of an approach for nonparametric probabilistic forecasting of wind power that combines support vector machines and nonlinear quantile regression with non-crossing constraints. A numerical case study is conducted using publicly available wind data from the Global Energy Forecasting Competition 2014. Multiple quantiles are estimated to form 20%, 40%, 60% and 80% prediction intervals which are evaluated using the pinball loss function and reliability measures. Three benchmark models are used for comparison where results demonstrate the proposed approach leads to significantly better performance while preventing the problem of overlapping quantile estimates.


Toward Finding Malicious Cyber Discussions in Social Media

AAAI Conferences

Security analysts gather essential information about cyber attacks, exploits, vulnerabilities, and victims by manually searching social media sites. This effort can be dramatically reduced using natural language machine learning techniques. Using a new English text corpus containing more than 250K discussions from Stack Exchange, Reddit, and Twitter on cyber and non-cyber topics, we demonstrate the ability to detect more than 90% of the cyber discussions with fewer than 1% false alarms. If an original searched document corpus includes only 5% cyber documents, then our processing provides an enriched corpus for analysts where 83% to 95% of the documents are on cyber topics. Good performance was obtained using term frequency (TF) – inverse document frequency (IDF) (TF–IDF) features and either logistic regression or linear support vector machine (SVM) classifiers. A classifier trained using prior historical data accurately detected 86% of emergent Heartbleed discussions and retrospective experiments demonstrate that classifier performance remains stable up to a year without retraining.