So, when I need artificial intelligence to automate my business, why can't I get recommendations for which machine learning algorithm best suits my individual needs? Every business is unique, and there are hundreds of algorithms available, each one with individual strengths and weaknesses. Just like I don't look at every individual book when choosing which one to read, I don't have the time, resources, or knowledge to try out each and every algorithm. I want artificial intelligence to recommend a short list of algorithms for me to try on my data.
With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms our variant comes with parallel acceleration guarantees and it poses no overly tight latency constraints, which might only be available in the multicore setting. Our analysis introduces a novel proof technique --- contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits. As a side effect this answers the question of how quickly stochastic gradient descent algorithms reach the asymptotically normal regime.
PAC-Bayes bounds have been proposed to get risk estimates based on a training sample. In this paper the PAC-Bayes approach is combined with stability of the hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting is used with a Gaussian prior centered at the expected output. Thus a novelty of our paper is using priors defined in terms of the data-generating distribution. Our main result estimates the risk of the randomized algorithm in terms of the hypothesis stability coefficients.
EM (Expectation-Maximisation) Algorithm is the go to algorithm whenever we have to do parameter estimation with hidden variables, such as in hidden Markov Chains. For some reason, it is often poorly explained and students end up confused as to what exactly are we maximising in the E-step and M-steps. Here is my attempt at a (hopefully) clear and step by step explanation on exactly how EM Algorithm works.
A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is "which algorithm should I use?" Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms. We are not advocating a one and done approach, but we do hope to provide some guidance on which algorithms to try first depending on some clear factors.