Sometimes, you see a diagram and it gives you an'aha ha' moment I saw it on Frederick kratzert's blog Using the input variables x and y, The forwardpass (left half of the figure) calculates output z as a function of x and y i.e. f(x,y) The right side of the figures shows the backwardpass. Receiving dL/dz (the derivative of the total loss with respect to the output z), we can calculate the individual gradients of x and y on the loss function by applying the chain rule, as shown in the figure. This post is a part of my forthcoming book on Mathematical foundations of Data Science. The goal of the neural network is to minimise the loss function for the whole network of neurons. Hence, the problem of solving equations represented by the neural network also becomes a problem of minimising the loss function for the entire network.

Graph based semi-supervised learning (GSSL) has intuitive representation and can be improved by exploiting the matrix calculation. However, it has to perform iterative optimization to achieve a preset objective, which usually leads to low efficiency. Another inconvenience lying in GSSL is that when new data come, the graph construction and the optimization have to be conducted all over again. We propose a sound assumption, arguing that: the neighboring data points are not in peer-to-peer relation, but in a partial-ordered relation induced by the local density and distance between the data; and the label of a center can be regarded as the contribution of its followers. Starting from the assumption, we develop a highly efficient non-iterative label propagation algorithm based on a novel data structure named as optimal leading forest (LaPOLeaF). The major weaknesses of the traditional GSSL are addressed by this study. We further scale LaPOLeaF to accommodate big data by utilizing block distance matrix technique, parallel computing, and Locality-Sensitive Hashing (LSH). Experiments on large datasets have shown the promising results of the proposed methods.

Using high-level frameworks like Keras, TensorFlow or PyTorch allows us to build very complex models quickly. However, it is worth taking the time to look inside and understand underlying concepts. Not so long ago I published an article, explaining -- in a simple way -- how neural nets work. However, it was highly theoretical post, dedicated primarily to math, which is the source of NN superpower. From the beginning I was planning to follow-up this topic in a more practical way.

The distinctive driving force of constraint programming to solve combinatorial problems has been a privileged access to problem structure through the high-level models it uses. From that exposed structure in the form of so-called global constraints, powerful inference algorithms have shared information between constraints by propagating it through shared variables' domains, traditionally by removing unsupported values. Beliefs about individual variable-value assignments are exchanged between contraints and iteratively adjusted. It generalizes standard support propagation and aims to converge to the true marginal distributions of the solutions over individual variables. The necessary architectural changes to a constraint programming solver are described and an empirical study of the proposal is conducted on its implementation.

Mateescu, R., Kask, K., Gogate, V., Dechter, R.

The paper investigates parameterized approximate message-passing schemes that are based on bounded inference and are inspired by Pearls belief propagation algorithm (BP). We start with the bounded inference mini-clustering algorithm and then move to the iterative scheme called Iterative Join-Graph Propagation (IJGP), that combines both iteration and bounded inference. Algorithm IJGP belongs to the class of Generalized Belief Propagation algorithms, a framework that allowed connections with approximate algorithms from statistical physics and is shown empirically to surpass the performance of mini-clustering and belief propagation, as well as a number of other state-of-the-art algorithms on several classes of networks. We also provide insight into the accuracy of iterative BP and IJGP by relating these algorithms to well known classes of constraint propagation schemes.