de Souza, César Roberto, Pizzolato, Ednaldo Brigante, Anjo, Mauro dos Santos

In this paper, we explore and detail our experiments in a high-dimensionality, multi-class image classification problem often found in the automatic recognition of Sign Languages. Here, our efforts are directed towards comparing the characteristics, advantages and drawbacks of creating and training Support Vector Machines disposed in a Directed Acyclic Graph and Artificial Neural Networks to classify signs from the Brazilian Sign Language (LIBRAS). We explore how the different heuristics, hyperparameters and multi-class decision schemes affect the performance, efficiency and ease of use for each classifier. We provide hyperparameter surface maps capturing accuracy and efficiency, comparisons between DDAGs and 1-vs-1 SVMs, and effects of heuristics when training ANNs with Resilient Backpropagation. We report statistically significant results using Cohen's Kappa statistic for contingency tables.

Consequently, there is an equivalence between parameter averaging and update-based data parallelism, when parameters are updated synchronously (this last part is key). This equivalence also holds for multiple averaging steps and other updaters (not just simple SGD). Update-based data parallelism becomes more interesting (and arguably more useful) when we relax the synchronous update requirement. That is, by allowing the updates Wi,j to be applied to the parameter vector as soon as they are computed (instead of waiting for N 1 iterations by all workers), we obtain asynchronous stochastic gradient descent algorithm. These benefits are not without cost, however.

Let's look at several techniques in machine learning and the math topics that are used in the process. In linear regression, we try to find the best fit line or hyperplane for a given set of data points. The parameters are found by minimizing the residual sum of squares. We find a critical point by setting the vector of derivatives of the residual sum of squares to the zero vector. By the second derivative test, if the Hessian of the residual sum of squares at a critical point is positive definite, then the residual sum of squares has a local minimum there.

Machine learning is a wildly popular field of technology that is being used by data scientists around the globe. Mastering machine learning can be achieved via many avenues of study, but one arguably necessary ingredient to success is a fundamental understanding of the mathematics behind the algorithms. Some data scientists-in-training often try to take a shortcut and bypass the math, but that route is shortsighted. In order to get the most out of machine learning, you really need that important perspective for what the algorithm is really doing behind the scenes. This perspective is only available with the math.

Would people who are strong in math be good in machine learning? Certainly having a strong background in mathematics will make it easier to understand machine learning at a conceptual level. When someone introduces you to the inference function in logistic regression, you'll say, "Hey, that's just linear algebra!" But surely deep learning must be something new? Not harder, just more (thank God for automatic differentiation).