This paper considers the computational power of constant size, dynamic Bayesian networks. Although discrete dynamic Bayesian networks are no more powerful than hidden Markov models, dynamic Bayesian networks with continuous random variables and discrete children of continuous parents are capable of performing Turing-complete computation. With modified versions of existing algorithms for belief propagation, such a simulation can be carried out in real time. This result suggests that dynamic Bayesian networks may be more powerful than previously considered. Relationships to causal models and recurrent neural networks are also discussed.
Roughly speaking, my machine learning journey began on Kaggle. "Regression models predict continuous-valued real numbers; classification models predict'red,' 'green,' 'blue.' Typically, the former employs the mean squared error or mean absolute error; the latter, the cross-entropy loss. Stochastic gradient descent updates the model's parameters to drive these losses down." Furthermore, to fit these models, just import sklearn. A dexterity with the above is often sufficient for -- at least from a technical stance -- both employment and impact as a data scientist. In industry, commonplace prediction and inference problems -- binary churn, credit scoring, product recommendation and A/B testing, for example -- are easily matched with an off-the-shelf algorithm plus proficient data scientist for a measurable boost to the company's bottom line. In a vacuum I think this is fine: the winning driver does not need to know how to build the car.
Deep learning (DL) algorithms have successfully solved real-world classification problems from a variety of fields, including recognizing handwritten digits and identifying the presence of key diagnostic features in medical images [18, 16]. A typical classification challenge for a DL algorithm consists of training the algorithm on an example data set, then using a separate set of test data to evaluate its performance. The aim is to provide answers that are as accurate as possible, as measured by the true positive rate (TPR) and the true negative rate (TNR). Many DL classifiers, particularly those using a softmax function in the very last layer, yield a continuous score, h; A step function is used to map this continuous score to each of the possible categories that are being classified. TPR and TNR scores are then generated for each separate variable that is being predicted by setting a threshold parameter that is applied when mapping h to the decision. Values above this threshold are mapped to positive predictions, while values below it are mapped to negative predictions. The ROC curve is then generated from these pairs of TPR/TPN scores.