bmn
Consistency of the $k$-Nearest Neighbor Regressor under Complex Survey Designs
We study the consistency of the $k$-nearest neighbor regressor under complex survey designs. While consistency results for this algorithm are well established for independent and identically distributed data, corresponding results for complex survey data are lacking. We show that the $k$-nearest neighbor regressor is consistent under regularity conditions on the sampling design and the distribution of the data. We derive lower bounds for the rate of convergence and show that these bounds exhibit the curse of dimensionality, as in the independent and identically distributed setting. Empirical studies based on simulated and real data illustrate our theoretical findings.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Texas (0.04)
An Efficient Pseudo-likelihood Method for Sparse Binary Pairwise Markov Network Estimation
Geng, Sinong, Kuang, Zhaobin, Page, David
The pseudo-likelihood method is one of the most popular algorithms for learning sparse binary pairwise Markov networks. In this paper, we formulate the $L_1$ regularized pseudo-likelihood problem as a sparse multiple logistic regression problem. In this way, many insights and optimization procedures for sparse logistic regression can be applied to the learning of discrete Markov networks. Specifically, we use the coordinate descent algorithm for generalized linear models with convex penalties, combined with strong screening rules, to solve the pseudo-likelihood problem with $L_1$ regularization. Therefore a substantial speedup without losing any accuracy can be achieved. Furthermore, this method is more stable than the node-wise logistic regression approach on unbalanced high-dimensional data when penalized by small regularization parameters. Thorough numerical experiments on simulated data and real world data demonstrate the advantages of the proposed method.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.83)
Estimating Well-Performing Bayesian Networks using Bernoulli Mixtures
A novel method for estimating Bayesian network (BN) parameters from data is presented which provides improved performance on test data. Previous research has shown the value of representing conditional probability distributions (CPDs) via neural networks(Neal 1992), noisy-OR gates (Neal 1992, Diez 1993)and decision trees (Friedman and Goldszmidt 1996).The Bernoulli mixture network (BMN) explicitly represents the CPDs of discrete BN nodes as mixtures of local distributions,each having a different set of parents.This increases the space of possible structures which can be considered,enabling the CPDs to have finer-grained dependencies.The resulting estimation procedure induces a modelthat is better able to emulate the underlying interactions occurring in the data than conventional conditional Bernoulli network models.The results for artificially generated data indicate that overfitting is best reduced by restricting the complexity of candidate mixture substructures local to each node. Furthermore, mixtures of very simple substructures can perform almost as well as more complex ones.The BMN is also applied to data collected from an online adventure game with an application to keyhole plan recognition. The results show that the BMN-based model brings a dramatic improvement in performance over a conventional BN model.
- Oceania > Australia > Victoria (0.04)
- Oceania > Australia > South Australia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)