Support Vector Machine (SVM) is a widely-used supervised machine learning algorithm. It is mostly used in classification tasks but suitable for regression tasks as well. In this post, we dive deep into two important hyperparameters of SVMs, C and gamma, and explain their effects with visualizations. So I will assume you have a basic understanding of the algorithm and focus on these hyperparameters. SVM separates data points that belong to different classes with a decision boundary.
"Just as electricity transformed almost every industry 100 years ago, today I actually have hard time thinking of an industry that I don't think AI (Artificial Intelligence) will transform in the next several years" -- Andrew NG I have long been fascinated with these algorithms, capable of something that we can as humans barely begin to comprehend. However, even with all these resources one of the biggest setbacks any ML practitioner has ever faced would be tuning the model's hyperparameters. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned. The same kind of machine learning model can be trained on different constraints, learning rates or kernels and other such parameters to generalize to different datasets, and hence these instructions have to be tuned so that the model can optimally solve the machine learning problem.
Using multiple regularization hyperparameters is an effective method for managing model complexity in problems where input features have varying amounts of noise. While algorithms for choosing multiple hyperparameters are often used in neural networks and support vector machines, they are not common in structured prediction tasks, such as sequence labeling or parsing. In this paper, we consider the problem of learning regularization hyperparameters for log-linear models, a class of probabilistic models for structured prediction tasks which includes conditional random fields (CRFs). Using an implicit differentiation trick, we derive an efficient gradient-based method for learning Gaussian regularization priors with multiple hyperparameters. In both simulations and the real-world task of computational RNA secondary structure prediction, we find that multiple hyperparameter learning provides a significant boost in accuracy compared to models learned using only a single regularization hyperparameter.
If we summarize what we've done so far in the "Addressing the problem of overfitting" article series, we've discussed three different techniques that can be used to mitigate overfitting. As you already know, Cross-validation (discussed in Part 1), Regularization (discussed in Part 2) and Dimensionality Reduction (discussed in Part 3) can effectively mitigate overfitting. In Part 4, today we discuss another useful technique called Creating Ensembles. However, this technique is limited to tree-based models. Someone can attempt to build a decision tree model (Step 1) without limiting the tree growth (without early stopping or without doing any hyperparameter tuning).