Directed Networks
SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers
Albardan, Mahmoud, Klein, John, Colot, Olivier
When several predictors have been trained to solve the same classification task, a second level of algorithmic procedure is necessary to reconcile the classifier predictions and deliver a single one. Such a procedure is known as classifier combination, fusion or aggregation. When each individual classifier is trained using the same training algorithm (but under different circumstances) the aggregation procedure is referred to as an ensemble method. When each classifier may be generated by different training algorithms, the aggregation procedure is referred to as a multiple classifier system. In both cases, the set of individual classifiers is called a classifier ensemble. Classifier combination comes either from a choice of the programmer or is imposed by context. In the first case, combination is meant to increase classification performances by either increasing the learning capacity or mitigating 1 arXiv:1908.06475v1
Music Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions
Nakamura, Eita, Yoshii, Kazuyoshi
YY, ZZZZ 1 Music Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions Eita Nakamura, Kazuyoshi Y oshii, Member, IEEE Abstract --Most work on models for music transcription has focused on describing local sequential dependence of notes in musical scores and failed to capture their global repetitive structure, which can be a useful guide for transcribing music. Focusing on the rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetitions indirectly by sparse transition probabilities of notes or note patterns. This enables us to construct piece-specific models for unseen scores with unfixed repetitive structure and to derive tractable inference algorithms. Moreover, to describe approximate repetitions, we explicitly incorporate a process of modifying the repeated notes/note patterns. We apply these models as a prior music language model for rhythm transcription, where piece-specific score models are inferred from performed MIDI data by unsupervised learning, in contrast to the conventional supervised construction of score models. Evaluations using vocal melodies of popular music showed that the Bayesian models improved the transcription accuracy for most of the tested model types, indicating the universal efficacy of the proposed approach. I NTRODUCTION Music transcription is an actively studied but yet unsolved problem in music information processing [1], [2]. One of the goals of music transcription is to convert a music performance signal into a human-readable symbolic musical score. While recent studies have achieved highly accurate pitch detection [3]-[7], it is also necessary to transcribe rhythms in order to obtain symbolic music representation [8]-[18]. Since there are many logically possible representations of rhythms (including meaningless one for humans) for a given performance [11], using a score model that describes prior knowledge about musical scores is a key to solve this problem. A common approach for music transcription is to integrate a musical score (language) model and a performance/acoustic model to obtain a proper transcription that best fits an input performance signal, similarly to the method of statistical speech recognition. More recently, end-to-end approaches have also been attempted [19]-[21], which have been of limited success so far. Manuscript received XX, YY; revised XX, YY . This work was supported partially by JSPS KAKENHI (Nos. The work of EN was supported by the JSPS research fellowship (PD).
Prune Sampling: a MCMC inference technique for discrete and deterministic Bayesian networks
Phillipson, Frank, Parie, Jurriaan, Weikamp, Ron
We introduce and characterise the performance of the Markov chain Monte Carlo (MCMC) inference method Prune Sampling for discrete and deterministic Bayesian networks (BNs). We developed a procedure to obtain the performance of a MCMC sampling method in the limit of infinite simulation time, extrapolated from relatively short simulations. This approach was used to conduct a study to compare the accuracy, rate of convergence and the time consumption of Prune Sampling with two conventional MCMC sampling methods: Gibbs- and Metropolis sampling. We show that Markov chains created by Prune Sampling always converge to the desired posterior distribution, also for networks where conventional Gibbs sampling fails. Beside this, we demonstrate that pruning outperforms Gibbs sampling, at least for a certain class of BNs. Though, this tempting feature comes at a price. In the first version of Prune Sampling, for large BNs the procedure to choose the next iteration step uniformly is rather time intensive. Our conclusion is that Prune Sampling is a competitive method for all types of small and medium sized BNs, but (for now) standard methods still perform better for all types of large BNs.
"Conservatives Overfit, Liberals Underfit": The Social-Psychological Control of Affect and Uncertainty
Hoey, Jesse, MacKinnon, Neil J.
The presence of artificial agents in human social networks is growing. From chatbots to robots, human experience in the developed world is moving towards a socio-technical system in which agents can be technological or biological, with increasingly blurred distinctions between. Given that emotion is a key element of human interaction, enabling artificial agents with the ability to reason about affect is a key stepping stone towards a future in which technological agents and humans can work together. This paper presents work on building intelligent computational agents that integrate both emotion and cognition. These agents are grounded in the well-established social-psychological Bayesian Affect Control Theory (BayesAct). The core idea of BayesAct is that humans are motivated in their social interactions by affective alignment: they strive for their social experiences to be coherent at a deep, emotional level with their sense of identity and general world views as constructed through culturally shared symbols. This affective alignment creates cohesive bonds between group members, and is instrumental for collaborations to solidify as relational group commitments. BayesAct agents are motivated in their social interactions by a combination of affective alignment and decision theoretic reasoning, trading the two off as a function of the uncertainty or unpredictability of the situation. This paper provides a high-level view of dual process theories and advances BayesAct as a plausible, computationally tractable model based in social-psychological theory. We introduce a revised BayesAct model that more deeply integrates social-psychological theorising, and we demonstrate a component of the model as being sufficient to account for cognitive biases about fairness, dissonance and conformity. We show how the model can unify different exploration strategies in reinforcement learning.
Using Wasserstein-2 regularization to ensure fair decisions with Neural-Network classifiers
Risser, Laurent, Vincenot, Quentin, Couellan, Nicolas, Loubes, Jean-Michel
In this paper, we propose a new method to build fair Neural-Network classifiers by using a constraint based on the Wasserstein distance. More specifically, we detail how to efficiently compute the gradients of Wasserstein-2 regularizers for Neural-Networks. The proposed strategy is then used to train Neural-Networks decision rules which favor fair predictions. Our method fully takes into account two specificities of Neural-Networks training: (1) The network parameters are indirectly learned based on automatic differentiation and on the loss gradients, and (2) batch training is the gold standard to approximate the parameter gradients, as it requires a reasonable amount of computations and it can efficiently explore the parameters space. Results are shown on synthetic data, as well as on the UCI Adult Income Dataset. Our method is shown to perform well compared with 'ZafarICWWW17' and linear-regression with Wasserstein-1 regularization, as in 'JiangUAI19', in particular when non-linear decision rules are required for accurate predictions.
Bayesian Generative Models for Knowledge Transfer in MRI Semantic Segmentation Problems
Kuzina, Anna, Egorov, Evgenii, Burnaev, Evgeny
Automatic segmentation methods based on deep learning have recently demonstrated state-of-the-art performance, outperforming the ordinary methods. Nevertheless, these methods are inapplicable for small datasets, which are very common in medical problems. To this end, we propose a knowledge transfer method between diseases via the Generative Bayesian Prior network. Our approach is compared to a pre-train approach and random initialization and obtains the best results in terms of Dice Similarity Coefficient metric for the small subsets of the Brain Tumor Segmentation 2018 database (BRATS2018).
Mixed pooling of seasonality in time series pallet forecasting
Multiple seasonal patterns play a key role in time series forecasting, especially for business time series where seasonal effects are often dramatic. Previous approaches including Fourier decomposition, exponential smoothing, and seasonal autoregressive integrated moving average (SARIMA) models do not reflect the distinct characteristics of each period in seasonal patterns, such as the unique behavior of specific days of the week in business data. We propose a multi-dimensional hierarchical model. Intermediate parameters for each seasonal period are first estimated, and a mixture of intermediate parameters is then taken, resulting in a model that successfully reflects the interactions between multiple seasonal patterns. Although this process reduces the data available for each parameter, a robust estimation can be obtained through a hierarchical Bayesian model implemented in Stan. Through this model, it becomes possible to consider both the characteristics of each seasonal period and the interactions among characteristics from multiple seasonal periods. Our new model achieved considerable improvements in prediction accuracy compared to previous models, including Fourier decomposition, which Prophet uses to model seasonality patterns. A comparison was performed on a real-world dataset of pallet transport from a national-scale logistic network.
A Deep Evolutionary Approach to Bioinspired Classifier Optimisation for Brain-Machine Interaction
Bird, Jordan J., Faria, Diego R., Manso, Luis J., Ekรกrt, Anikรณ, Buckingham, Christopher D.
This study suggests a new approach to EEG data classification by exploring the idea of using evolutionary computation to both select useful discriminative EEG features and optimise the topology of Artificial Neural Networks. An evolutionary algorithm is applied to select the most informative features from an initial set of 2550 EEG statistical features. Optimisation of a Multilayer Perceptron (MLP) is performed with an evolutionary approach before classification to estimate the best hyperparameters of the network. Deep learning and tuning with Long Short-Term Memory (LSTM) are also explored, and Adaptive Boosting of the two types of models is tested for each problem. Three experiments are provided for comparison using different classifiers: one for attention state classification, one for emotional sentiment classification, and a third experiment in which the goal is to guess the number a subject is thinking of. The obtained results show that an Adaptive Boosted LSTM can achieve an accuracy of 84.44%, 97.06%, and 9.94% on the attentional, emotional, and number datasets, respectively. An evolutionary-optimised MLP achieves results close to the Adaptive Boosted LSTM for the two first experiments and significantly higher for the number-guessing experiment with an Adaptive Boosted DEvo MLP reaching 31.35%, while being significantly quicker to train and classify. In particular, the accuracy of the nonboosted DEvo MLP was of 79.81%, 96.11%, and 27.07% in the same benchmarks. Two datasets for the experiments were gathered using a Muse EEG headband with four electrodes corresponding to TP9, AF7, AF8, and TP10 locations of the international EEG placement standard. The EEG MindBigData digits dataset was gathered from the TP9, FP1, FP2, and TP10 locations.
Least Squares Approximation for a Distributed System
Zhu, Xuening, Li, Feng, Wang, Hansheng
In this work we develop a distributed least squares approximation (DLSA) method, which is able to solve a large family of regression problems (e.g., linear regression, logistic regression, Cox's model) on a distributed system. By approximating the local objective function using a local quadratic form, we are able to obtain a combined estimator by taking a weighted average of local estimators. The resulting estimator is proved to be statistically as efficient as the global estimator. In the meanwhile it requires only one round of communication. We further conduct the shrinkage estimation based on the DLSA estimation by using an adaptive Lasso approach. The solution can be easily obtained by using the LARS algorithm on the master node. It is theoretically shown that the resulting estimator enjoys the oracle property and is selection consistent by using a newly designed distributed Bayesian Information Criterion (DBIC). The finite sample performance as well as the computational efficiency are further illustrated by extensive numerical study and an airline dataset. The airline dataset is 52GB in memory size. The entire methodology has been implemented by Python for a de-facto standard Spark system. By using the proposed DLSA algorithm on the Spark system, it takes 26 minutes to obtain a logistic regression estimator whereas a full likelihood algorithm takes 15 hours to reaches an inferior result.
Distributionally Robust Optimization: A Review
Rahimian, Hamed, Mehrotra, Sanjay
The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization.