Support Vector Machines
Bayesian Sparse Factor Analysis with Kernelized Observations
Sevilla-Salcedo, Carlos, Guerrero-Lรณpez, Alejandro, Olmos, Pablo M., Gรณmez-Verdejo, Vanessa
Latent variable models for multi-view learning attempt to find low-dimensional projections that fairly capture the correlations among multiple views that characterise each datum. High-dimensional views in medium-sized datasets and non-linear problems are traditionally handled by kernel methods, inducing a (non)-linear function between the latent projection and the data itself. However, they usually come with scalability issues and exposition to overfitting. To overcome these limitations, instead of imposing a kernel function, here we propose an alternative method. In particular, we combine probabilistic factor analysis with what we refer to as kernelized observations, in which the model focuses on reconstructing not the data itself, but its correlation with other data points measured by a kernel function. This model can combine several types of views (kernelized or not), can handle heterogeneous data and work in semi-supervised settings. Additionally, by including adequate priors, it can provide compact solutions for the kernelized observations (based in a automatic selection of bayesian support vectors) and can include feature selection capabilities. Using several public databases, we demonstrate the potential of our approach (and its extensions) w.r.t. common multi-view learning models such as kernel canonical correlation analysis or manifold relevance determination gaussian processes latent variable models.
Optimally Combining Classifiers for Semi-Supervised Learning
Wang, Zhiguo, Yang, Liusha, Yin, Feng, Lin, Ke, Shi, Qingjiang, Luo, Zhi-Quan
This paper considers semi-supervised learning for tabular data. It is widely known that Xgboost based on tree model works well on the heterogeneous features while transductive support vector machine can exploit the low density separation assumption. However, little work has been done to combine them together for the end-to-end semi-supervised learning. In this paper, we find these two methods have complementary properties and larger diversity, which motivates us to propose a new semi-supervised learning method that is able to adaptively combine the strengths of Xgboost and transductive support vector machine. Instead of the majority vote rule, an optimization problem in terms of ensemble weight is established, which helps to obtain more accurate pseudo labels for unlabeled data. The experimental results on the UCI data sets and real commercial data set demonstrate the superior classification performance of our method over the five state-of-the-art algorithms improving test accuracy by about $3\%-4\%$. The partial code can be found at https://github.com/hav-cam-mit/CTO.
Explainable Artificial Intelligence: a Systematic Review
This has led to the development of a plethora of domain-dependent and context-specific methods for dealing with the interpretation of machine learning (ML) models and the formation of explanations for humans. Unfortunately, this trend is far from being over, with an abundance of knowledge in the field which is scattered and needs organisation. The goal of this article is to systematically review research works in the field of XAI and to try to define some boundaries in the field. From several hundreds of research articles focused on the concept of explainability, about 350 have been considered for review by using the following search methodology. In a first phase, Google Scholar was queried to find papers related to "explainable artificial intelligence", "explainable machine learning" and "interpretable machine learning". Subsequently, the bibliographic section of these articles was thoroughly examined to retrieve further relevant scientific studies. The first noticeable thing, as shown in figure 2 (a), is the distribution of the publication dates of selected research articles: sporadic in the 70s and 80s, receiving preliminary attention in the 90s, showing raising interest in 2000 and becoming a recognised body of knowledge after 2010. The first research concerned the development of an explanation-based system and its integration in a computer program designed to help doctors make diagnoses [3]. Some of the more recent papers focus on work devoted to the clustering of methods for explainability, motivating the need for organising the XAI literature [4, 5, 6].
Prediction of short and long-term droughts using artificial neural networks and hydro-meteorological variables
Hassanzadeh, Yousef, Ghazvinian, Mohammadvaghef, Abdi, Amin, Baharvand, Saman, Jozaghi, Ali
Drought is a natural creeping threat with numerous damaging effects in various aspects of human life. Accurate drought prediction is a promising step in helping policy makers to set drought risk management strategies. To fulfill this purpose, choosing appropriate models plays an important role in predicting approach. In this study, different models of Artificial Neural Network (ANN) are employed to predict short and long-term of droughts by using Standardized Precipitation Index (SPI) at different time scales, including 3, 6, 12, 24 and 48 months in Tabriz city, Iran. To this end, different combination of calculated SPI and time series of various hydro-meteorological variables, such as precipitation, wind velocity, relative humidity and sunshine hours for years 1992 to 2010 are used to train the ANN models. In order to compare the models performances, some well-known measures, namely RMSE, Mean Absolute Error (MAE) and Correlation Coefficient (CC) are utilized in the present study. The results illustrate that the application of all hydro-meteorological variables significantly improves the prediction of SPI at different time scales.
Near-Tight Margin-Based Generalization Bounds for Support Vector Machines
Grรธnlund, Allan, Kamma, Lior, Larsen, Kasper Green
Support Vector Machines (SVMs) are among the most fundamental tools for binary classification. In its simplest formulation, an SVM produces a hyperplane separating two classes of data using the largest possible margin to the data. The focus on maximizing the margin has been well motivated through numerous generalization bounds. In this paper, we revisit and improve the classic generalization bounds in terms of margins. Furthermore, we complement our new generalization bound by a nearly matching lower bound, thus almost settling the generalization performance of SVMs in terms of margins.
Solution Path Algorithm for Twin Multi-class Support Vector Machine
Chen, Liuyuan, Zhou, Kanglei, Jing, Junchang, Fan, Haiju, Li, Juntao
The twin support vector machine and its extensions have made great achievements in dealing with binary classification problems, however, which is faced with some difficulties such as model selection and solving multi-classification problems quickly. This paper is devoted to the fast regularization parameter tuning algorithm for the twin multi-class support vector machine. A new sample dataset division method is adopted and the Lagrangian multipliers are proved to be piecewise linear with respect to the regularization parameters by combining the linear equations and block matrix theory. Eight kinds of events are defined to seek for the starting event and then the solution path algorithm is designed, which greatly reduces the computational cost. In addition, only few points are combined to complete the initialization and Lagrangian multipliers are proved to be 1 as the regularization parameter tends to infinity. Simulation results based on UCI datasets show that the proposed method can achieve good classification performance with reducing the computational cost of grid search method from exponential level to the constant level.
Parallelizing Machine Learning as a Service for the End-User
Loreti, Daniela, Lippi, Marco, Torroni, Paolo
As ML applications are becoming ever more pervasive, fully-trained systems are made increasingly available to a wide public, allowing end-users to submit queries with their own data, and to efficiently retrieve results. With increasingly sophisticated such services, a new challenge is how to scale up to evergrowing user bases. In this paper, we present a distributed architecture that could be exploited to parallelize a typical ML system pipeline. We propose a case study consisting of a text mining service and discuss how the method can be generalized to many similar applications. We demonstrate the significance of the computational gain boosted by the distributed architecture by way of an extensive experimental evaluation.
Detecting Problem Statements in Peer Assessments
Xiao, Yunkai, Zingle, Gabriel, Jia, Qinjin, Shah, Harsh R., Zhang, Yi, Li, Tianyi, Karovaliya, Mohsin, Zhao, Weixiang, Song, Yang, Ji, Jie, Balasubramaniam, Ashwin, Patel, Harshit, Bhalasubbramanian, Priyankha, Patel, Vikram, Gehringer, Edward F.
Effective peer assessment requires students to be attentive to the deficiencies in the work they rate. Thus, their reviews should identify problems. But what ways are there to check that they do? We attempt to automate the process of deciding whether a review comment detects a problem. We use over 18,000 review comments that were labeled by the reviewees as either detecting or not detecting a problem with the work. We deploy several traditional machine-learning models, as well as neural-network models using GloVe and BERT embeddings. We find that the best performer is the Hierarchical Attention Network classifier, followed by the Bidirectional Gated Recurrent Units (GRU) Attention and Capsule model with scores of 93.1% and 90.5% respectively. The best non-neural network model was the support vector machine with a score of 89.71%. This is followed by the Stochastic Gradient Descent model and the Logistic Regression model with 89.70% and 88.98%.
Proper Learning, Helly Number, and an Optimal SVM Bound
Bousquet, Olivier, Hanneke, Steve, Moran, Shay, Zhivotovskiy, Nikita
The classical PAC sample complexity bounds are stated for any Empirical Risk Minimizer (ERM) and contain an extra logarithmic factor $\log(1/{\epsilon})$ which is known to be necessary for ERM in general. It has been recently shown by Hanneke (2016) that the optimal sample complexity of PAC learning for any VC class C is achieved by a particular improper learning algorithm, which outputs a specific majority-vote of hypotheses in C. This leaves the question of when this bound can be achieved by proper learning algorithms, which are restricted to always output a hypothesis from C. In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm. We identify that these classes can be characterized by the dual Helly number, which is a combinatorial parameter that arises in discrete geometry and abstract convexity. In particular, under general conditions on C, we show that the dual Helly number is bounded if and only if there is a proper learner that obtains the optimal joint dependence on $\epsilon$ and $\delta$. As further implications of our techniques we resolve a long-standing open problem posed by Vapnik and Chervonenkis (1974) on the performance of the Support Vector Machine by proving that the sample complexity of SVM in the realizable case is $\Theta((n/{\epsilon})+(1/{\epsilon})\log(1/{\delta}))$, where $n$ is the dimension. This gives the first optimal PAC bound for Halfspaces achieved by a proper learning algorithm, and moreover is computationally efficient.