Python's mlrose package provides functionality for implementing some of the most popular randomization and search algorithms, and applying them to a range of different optimization problem domains. In this tutorial, we will discuss how mlrose can be used to find the optimal weights for machine learning models, such as neural networks and regression models. That is, to solve the machine learning weight optimization problem. This is the third in a series of three tutorials about using mlrose to solve randomized optimization problems. Part 1 can be found here and Part 2 can be found here.

A few weeks ago, I wrote about how and why I was learning Machine Learning, mainly through Andrew Ng's Coursera course. Machine Learning is built on prerequisites, so much so that learning by first principles seems overwhelming. Do you really need to spend a month learning linear algebra? You'll be okay if you have some math and programming experience. You really just have to be familiar with Sigma notation and be able to express it in a for loop. Sure, your assignments will take longer to complete and the first few times you see those giant equations your head will spin, but you can do this! Calculus is not even required.

A few weeks ago, I wrote about how and why I was learning Machine Learning, mainly through Andrew Ng's Coursera course. Machine Learning is built on prerequisites, so much so that learning by first principles seems overwhelming. Do you really need to spend a month learning linear algebra? You'll be okay if you have some math and programming experience. You really just have to be familiar with Sigma notation and be able to express it in a for loop. Sure, your assignments will take longer to complete and the first few times you see those giant equations your head will spin, but you can do this!

The advanced feats we've seen machines do thus far have basically been examples of clever optimization techniques). So what does this learning process look like? First, weight and bias values are propagated forward through the model to arrive at a predicted output. At each neuron/node, the linear combination of the inputs is then multiplied by an activation function as described above-- the sigmoid function in our example. This process by which weights and biases are propagated from inputs to output is called forward propagation. After arriving at the predicted output, the loss for the training example is calculated.