bnc
RelationalSelf-Attention: What'sMissinginAttentionforVideoUnderstanding SupplementaryMaterial
Forthebottlenecks including RSAlayers, werandomly initializeweights using MSRA initialization [3] and set the gamma parameter of the last batch normalization layer to zero. We implement our model based on TSN in Pytorch2 under BSD 2-Clause license. All the benchmarks that we used are commonly used datasets for the academic purpose. While specified otherwise, the training and testing details are the sameasthoseinSec.5.1. Since each RSA kernel generated by each query captures a distinct motion pattern, the model can learn diverse motion features(seeFigure3). Inthisexperiment,wechooseL = 8asthedefault.
Towards the Next-generation Bayesian Network Classifiers
Zhang, Huan, Zhang, Daokun, Meng, Kexin, Webb, Geoffrey I.
Bayesian network classifiers provide a feasible solution to tabular data classification, with a number of merits like high time and memory efficiency, and great explainability. However, due to the parameter explosion and data sparsity issues, Bayesian network classifiers are restricted to low-order feature dependency modeling, making them struggle in extrapolating the occurrence probabilities of complex real-world data. In this paper, we propose a novel paradigm to design high-order Bayesian network classifiers, by learning distributional representations for feature values, as what has been done in word embedding and graph representation learning. The learned distributional representations are encoded with the semantic relatedness between different features through their observed co-occurrence patterns in training data, which then serve as a hallmark to extrapolate the occurrence probabilities of new test samples. As a classifier design realization, we remake the K-dependence Bayesian classifier (KDB) by extending it into a neural version, i.e., NeuralKDB, where a novel neural network architecture is designed to learn distributional representations of feature values and parameterize the conditional probabilities between interdependent features. A stochastic gradient descent based algorithm is designed to train the NeuralKDB model efficiently. Extensive classification experiments on 60 UCI datasets demonstrate that the proposed NeuralKDB classifier excels in capturing high-order feature dependencies and significantly outperforms the conventional Bayesian network classifiers, as well as other competitive classifiers, including two neural network based classifiers without distributional representation learning.
- South America > Paraguay > Asunción > Asunción (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
- (2 more...)
Efficient Parameter Estimation for Bayesian Network Classifiers using Hierarchical Linear Smoothing
Cooper, Connor, Webb, Geoffrey I., Schmidt, Daniel F.
Bayesian network classifiers (BNCs) possess a number of properties desirable for a modern classifier: They are easily interpretable, highly scalable, and offer adaptable complexity. However, traditional methods for learning BNCs have historically underperformed when compared to leading classification methods such as random forests. Recent parameter smoothing techniques using hierarchical Dirichlet processes (HDPs) have enabled BNCs to achieve performance competitive with random forests on categorical data, but these techniques are relatively inflexible, and require a complicated, specialized sampling process. In this paper, we introduce a novel method for parameter estimation that uses a log-linear regression to approximate the behaviour of HDPs. As a linear model, our method is remarkably flexible and simple to interpret, and can leverage the vast literature on learning linear models. Our experiments show that our method can outperform HDP smoothing while being orders of magnitude faster, remaining competitive with random forests on categorical data.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Wisconsin (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
- Research Report > New Finding (0.47)
- Research Report > Experimental Study (0.47)
Context-Specific Refinements of Bayesian Network Classifiers
Leonelli, Manuele, Varando, Gherardo
Supervised classification is one of the most ubiquitous tasks in machine learning. Generative classifiers based on Bayesian networks are often used because of their interpretability and competitive accuracy. The widely used naive and TAN classifiers are specific instances of Bayesian network classifiers with a constrained underlying graph. This paper introduces novel classes of generative classifiers extending TAN and other famous types of Bayesian network classifiers. Our approach is based on staged tree models, which extend Bayesian networks by allowing for complex, context-specific patterns of dependence. We formally study the relationship between our novel classes of classifiers and Bayesian networks. We introduce and implement data-driven learning routines for our models and investigate their accuracy in an extensive computational study. The study demonstrates that models embedding asymmetric information can enhance classification accuracy.
- North America > United States > Wisconsin (0.05)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
Trained on 100 million words and still in shape: BERT meets British National Corpus
Samuel, David, Kutuzov, Andrey, Øvrelid, Lilja, Velldal, Erik
While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus. We show that pre-training on this carefully curated corpus can reach better performance than the original BERT model. We argue that this type of corpora has great potential as a language modeling benchmark. To showcase this potential, we present fair, reproducible and data-efficient comparative studies of LMs, in which we evaluate several training objectives and model architectures and replicate previous empirical results in a systematic way. We propose an optimized LM architecture called LTG-BERT.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (13 more...)
A new class of generative classifiers based on staged tree models
Carli, Federico, Leonelli, Manuele, Varando, Gherardo
Generative models for classification use the joint probability distribution of the class variable and the features to construct a decision rule. Among generative models, Bayesian networks and naive Bayes classifiers are the most commonly used and provide a clear graphical representation of the relationship among all variables. However, these have the disadvantage of highly restricting the type of relationships that could exist, by not allowing for context-specific independences. Here we introduce a new class of generative classifiers, called staged tree classifiers, which formally account for context-specific independence. They are constructed by a partitioning of the vertices of an event tree from which conditional independence can be formally read. The naive staged tree classifier is also defined, which extends the classic naive Bayes classifier whilst retaining the same complexity. An extensive simulation study shows that the classification accuracy of staged tree classifiers is competitive with those of state-of-the-art classifiers. An applied analysis to predict the fate of the passengers of the Titanic highlights the insights that the new class of generative classifiers can give.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Europe > Italy (0.04)
- Health & Medicine (0.46)
- Transportation (0.34)
Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes
Petitjean, Francois, Buntine, Wray, Webb, Geoffrey I., Zaidi, Nayyar
This paper introduces a novel parameter estimation method for the probability tables of Bayesian network classifiers (BNCs), using hierarchical Dirichlet processes (HDPs). The main result of this paper is to show that improved parameter estimation allows BNCs to outperform leading learning methods such as Random Forest for both 0-1 loss and RMSE, albeit just on categorical datasets. As data assets become larger, entering the hyped world of "big", efficient accurate classification requires three main elements: (1) classifiers with low-bias that can capture the fine-detail of large datasets (2) out-of-core learners that can learn from data without having to hold it all in main memory and (3) models that can classify new data very efficiently. The latest Bayesian network classifiers (BNCs) satisfy these requirements. Their bias can be controlled easily by increasing the number of parents of the nodes in the graph. Their structure can be learned out of core with a limited number of passes over the data. However, as the bias is made lower to accurately model classification tasks, so is the accuracy of their parameters' estimates, as each parameter is estimated from ever decreasing quantities of data. In this paper, we introduce the use of Hierarchical Dirichlet Processes for accurate BNC parameter estimation. We conduct an extensive set of experiments on 68 standard datasets and demonstrate that our resulting classifiers perform very competitively with Random Forest in terms of prediction, while keeping the out-of-core capability and superior classification time.
- North America > United States > Wisconsin (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)