In this Course you learn k-Nearest Neighbors & Naive Bayes Classification Methods. In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. The k-NN algorithm is among the simplest of all machine learning algorithms. For classification, a useful technique can be to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. The neighbors are taken from a set of objects for which the class (for k-NN classification).
We consider the problem of estimating the transition rate matrix of a continuous-time Markov chain from a finite-duration realisation of this process. We approach this problem in an imprecise probabilistic framework, using a set of prior distributions on the unknown transition rate matrix. The resulting estimator is a set of transition rate matrices that, for reasons of conjugacy, is easy to find. To determine the hyperparameters for our set of priors, we reconsider the problem in discrete time, where we can use the well-known Imprecise Dirichlet Model. In particular, we show how the limit of the resulting discrete-time estimators is a continuous-time estimator. It corresponds to a specific choice of hyperparameters and has an exceptionally simple closed-form expression.
Presbyterian reverend Thomas Bayes had no reason to suspect he'd make any lasting contribution to humankind. Born in England at the beginning of the 18th century, Bayes was a quiet and questioning man. He published only two works in his lifetime. In 1731, he wrote a defense of God's--and the British monarchy's--"divine benevolence," and in 1736, an anonymous defense of the logic of Isaac Newton's calculus. Yet an argument he wrote before his death in 1761 would shape the course of history.
One important approach to learning Bayesian networks (BNs) from data uses a scoring metric to evaluate the fitness of any given candidate network for the data base, and applies a search procedure to explore the set of candidate networks. The most usual search methods are greedy hill climbing, either deterministic or stochastic, although other techniques have also been used. In this paper we propose a new algorithm for learning BNs based on a recently introduced metaheuristic, which has been successfully applied to solve a variety of combinatorial optimization problems: ant colony optimization (ACO). We describe all the elements necessary to tackle our learning problem using this metaheuristic, and experimentally compare the performance of our ACO-based algorithm with other algorithms used in the literature. The experimental work is carried out using three different domains: ALARM, INSURANCE and BOBLO.
Suppose that we wish to infer the value of a statistical parameter at a law from which we sample independent observations. Suppose that this parameter is smooth and that we can define two variation-independent, infinite-dimensional features of the law, its so called Q- and G-components (comp.), such that if we estimate them consistently at a fast enough product of rates, then we can build a confidence interval (CI) with a given asymptotic level based on a plain targeted minimum loss estimator (TMLE). The estimators of the Q- and G-comp. would typically be by products of machine learning algorithms. We focus on the case that the machine learning algorithm for the G-comp. is fine-tuned by a real-valued parameter h. Then, a plain TMLE with an h chosen by cross-validation would typically not lend itself to the construction of a CI, because the selection of h would trade-off its empirical bias with something akin to the empirical variance of the estimator of the G-comp. as opposed to that of the TMLE. A collaborative TMLE (C-TMLE) might, however, succeed in achieving the relevant trade-off. We construct a C-TMLE and show that, under high-level empirical processes conditions, and if there exists an oracle h that makes a bulky remainder term asymptotically Gaussian, then the C-TMLE is asymptotically Gaussian hence amenable to building a CI provided that its asymptotic variance can be estimated too. We illustrate the construction and main result with the inference of the average treatment effect, where the Q-comp. consists in a marginal law and a conditional expectation, and the G-comp. is a propensity score (a conditional probability). We also conduct a multi-faceted simulation study to investigate the empirical properties of the collaborative TMLE when the G-comp. is estimated by the LASSO. Here, h is the bound on the l1-norm of the candidate coefficients.
In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools from statistical physics. These notes are based on a lecture series given by the authors at the Courant Institute of Mathematical Sciences in New York City, on May 16th, 2017.
We propose the use of Bayesian networks, which provide both a mean value and an uncertainty estimate as output, to enhance the safety of learned control policies under circumstances in which a test-time input differs significantly from the training set. Our algorithm combines reinforcement learning and end-to-end imitation learning to simultaneously learn a control policy as well as a threshold over the predictive uncertainty of the learned model, with no hand-tuning required. Corrective action, such as a return of control to the model predictive controller or human expert, is taken when the uncertainty threshold is exceeded. We validate our method on fully-observable and vision-based partially-observable systems using cart-pole and autonomous driving simulations using deep convolutional Bayesian neural networks. We demonstrate that our method is robust to uncertainty resulting from varying system dynamics as well as from partial state observability.
Parametric images provide insight into the spatial distribution of physiological parameters, but they are often extremely noisy, due to low SNR of tomographic data. Direct estimation from projections allows accurate noise modeling, improving the results of post-reconstruction fitting. We propose a method, which we name kinetic compressive sensing (KCS), based on a hierarchical Bayesian model and on a novel reconstruction algorithm, that encodes sparsity of kinetic parameters. Parametric maps are reconstructed by maximizing the joint probability, with an Iterated Conditional Modes (ICM) approach, alternating the optimization of activity time series (OS-MAP-OSL), and kinetic parameters (MAP-LM). We evaluated the proposed algorithm on a simulated dynamic phantom: a bias/variance study confirmed how direct estimates can improve the quality of parametric maps over a post-reconstruction fitting, and showed how the novel sparsity prior can further reduce their variance, without affecting bias. Real FDG PET human brain data (Siemens mMR, 40min) images were also processed. Results enforced how the proposed KCS-regularized direct method can produce spatially coherent images and parametric maps, with lower spatial noise and better tissue contrast. A GPU-based open source implementation of the algorithm is provided.
Due to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable.