A Machine Learning interview calls for a rigorous interview process where the candidates are judged on various aspects such as technical and programming skills, knowledge of methods and clarity of basic concepts. If you aspire to apply for machine learning jobs, it is crucial to know what kind of interview questions generally recruiters and hiring managers may ask. This is an attempt to help you crack the machine learning interviews at major product based companies and start-ups. Usually, machine learning interviews at major companies require a thorough knowledge of data structures and algorithms. In the upcoming series of articles, we shall start from the basics of concepts and build upon these concepts to solve major interview questions. Machine learning interviews comprise of many rounds, which begin with a screening test. This comprises solving questions either on the white-board, or solving it on online platforms like HackerRank, LeetCode etc. Here, we have compiled a list of ...
Approximate Bayesian Computation (ABC) now serves as one of the major strategies to perform model choice and parameter inference on models with intractable likelihoods. An essential component of ABC involves comparing a large amount of simulated data with the observed data through summary statistics. To avoid the curse of dimensionality, summary statistic selection is of prime importance, and becomes even more critical when applying ABC to mechanistic network models. Indeed, while many summary statistics can be used to encode network structures, their computational complexity can be highly variable. For large networks, computation of summary statistics can quickly create a bottleneck, making the use of ABC difficult. To reduce this computational burden and make the analysis of mechanistic network models more practical, we investigated two questions in a model choice framework. First, we studied the utility of cost-based filter selection methods to account for different summary costs during the selection process. Second, we performed selection using networks generated with a smaller number of nodes to reduce the time required for the selection step. Our findings show that computationally inexpensive summary statistics can be efficiently selected with minimal impact on classification accuracy. Furthermore, we found that networks with a smaller number of nodes can only be employed to eliminate a moderate number of summaries. While this latter finding is network specific, the former is general and can be adapted to any ABC application.
The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian distributed latent utilities. We introduce two new algorithms for fitting the MPBART and show that the theoretical mixing rates of our proposals are equal or superior to the existing algorithm in KD. Through simulations, we explore the robustness of the methods to the choice of reference level, imbalance in outcome frequencies, and the specifications of prior hyperparameters for the utility error term. The work is motivated by the application of generating posterior predictive distributions for mortality and engagement in care among HIV-positive patients based on electronic health records (EHRs) from the Academic Model Providing Access to Healthcare (AMPATH) in Kenya. In both the application and simulations, we observe better performance using our proposals as compared to KD in terms of MCMC convergence rate and posterior predictive accuracy.
To perform Bayesian inference for stochastic simulator models for which the likelihood is not accessible, Likelihood-Free Inference (LFI) relies on simulations from the model. Standard LFI methods can be split according to how these simulations are used: to build an explicit Surrogate Likelihood, or to accept/reject parameter values according to a measure of distance from the observations (Approximate Bayesian Computation (ABC)). In both cases, simulations are adaptively tailored to the value of the observation. Here, we generate parameter-simulation pairs from the model independently on the observation, and use them to learn a conditional exponential family likelihood approximation; to parametrize it, we use Neural Networks whose weights are tuned with Score Matching. With our likelihood approximation, we can employ MCMC for doubly intractable distributions to draw samples from the posterior for any number of observations without additional model simulations, with performance competitive to comparable approaches. Further, the sufficient statistics of the exponential family can be used as summaries in ABC, outperforming the state-of-the-art method in five different models with known likelihood. Finally, we apply our method to a challenging model from meteorology.
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. However, a public benchmark with appropriate performance metrics for such 'likelihood-free' algorithms has been lacking. This has made it difficult to compare algorithms and identify their strengths and weaknesses. We set out to fill this gap: We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms including recent approaches employing neural networks and classical Approximate Bayesian Computation methods. We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency. Neural network-based approaches generally exhibit better performance, but there is no uniformly best algorithm. We provide practical advice and highlight the potential of the benchmark to diagnose problems and improve algorithms. The results can be explored interactively on a companion website. All code is open source, making it possible to contribute further benchmark tasks and inference algorithms.
Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given.
Bayesian learning is a powerful learning framework which combines the external information of the data (background information) with the internal information (training data) in a logically consistent way in inference and prediction. By Bayes rule, the external information (prior distribution) and the internal information (training data likelihood) are combined coherently, and the posterior distribution and the posterior predictive (marginal) distribution obtained by Bayes rule summarize the total information needed in the inference and prediction, respectively. In this paper, we study the Bayesian framework of the Tensor Network from two perspective. First, we introduce the prior distribution to the weights in the Tensor Network and predict the labels of the new observations by the posterior predictive (marginal) distribution. Since the intractability of the parameter integral in the normalization constant computation, we approximate the posterior predictive distribution by Laplace approximation and obtain the out-product approximation of the hessian matrix of the posterior distribution of the Tensor Network model. Second, to estimate the parameters of the stationary mode, we propose a stable initialization trick to accelerate the inference process by which the Tensor Network can converge to the stationary path more efficiently and stably with gradient descent method. We verify our work on the MNIST, Phishing Website and Breast Cancer data set. We study the Bayesian properties of the Bayesian Tensor Network by visualizing the parameters of the model and the decision boundaries in the two dimensional synthetic data set. For a application purpose, our work can reduce the overfitting and improve the performance of normal Tensor Network model.
Conditional inference provides a rigorous approach to counter bias when data from automated model selections is reused for inference. We develop in this paper a statistically consistent Bayesian framework to assess uncertainties within linear models that are informed by grouped sparsities in covariates. Finding wide applications when genes, proteins, genetic variants, neuroimaging measurements are grouped respectively by their biological pathways, molecular functions, regulatory regions, cognitive roles, these models are selected through a useful class of group-sparse learning algorithms. An adjustment factor to account precisely for the selection of promising groups, deployed with a generalized version of Laplace-type approximations is the centerpiece of our new methods. Accommodating well known group-sparse models such as those selected by the Group LASSO, the overlapping Group LASSO, the sparse Group LASSO etc., we illustrate the efficacy of our methodology in extensive experiments and on data from a human neuroimaging application.
When the dimension of data is comparable to or larger than the number of available data samples, Principal Components Analysis (PCA) is known to exhibit problematic phenomena of high-dimensional noise. In this work, we propose an Empirical Bayes PCA method that reduces this noise by estimating a structural prior for the joint distributions of the principal components. This EB-PCA method is based upon the classical Kiefer-Wolfowitz nonparametric MLE for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs, and iterative refinement using an Approximate Message Passing (AMP) algorithm. In theoretical "spiked" models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as the oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA can substantially improve over PCA when there is strong prior structure, both in simulation and on several quantitative benchmarks constructed using data from the 1000 Genomes Project and the International HapMap Project. A final illustration is presented for an analysis of gene expression data obtained by single-cell RNA-seq.
A connection between the General Linear Model (GLM) in combination with classical statistical inference and the machine learning (MLE)-based inference is described in this paper. Firstly, the estimation of the GLM parameters is expressed as a Linear Regression Model (LRM) of an indicator matrix, that is, in terms of the inverse problem of regressing the observations. In other words, both approaches, i.e. GLM and LRM, apply to different domains, the observation and the label domains, and are linked by a normalization value at the least-squares solution. Subsequently, from this relationship we derive a statistical test based on a more refined predictive algorithm, i.e. the (non)linear Support Vector Machine (SVM) that maximizes the class margin of separation, within a permutation analysis. The MLE-based inference employs a residual score and includes the upper bound to compute a better estimation of the actual (real) error. Experimental results demonstrate how the parameter estimations derived from each model resulted in different classification performances in the equivalent inverse problem. Moreover, using real data the aforementioned predictive algorithms within permutation tests, including such model-free estimators, are able to provide a good trade-off between type I error and statistical power.