Goto

Collaborating Authors

 Bayesian Learning


Bayesian Network Based Label Correlation Analysis For Multi-label Classifier Chain

arXiv.org Machine Learning

Bayesian Network Based Label Correlation Analysis For Multi-label Classifier Chain Ran Wang 1,2, Suhe Ye 1,2, Ke Li 3 and Sam Kwong 4 1 College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China. 2 Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen 518060, China. Abstract: Classifier chain (CC) is a multi-label learning approach that constructs a sequence of binary classifiers according to a label order. Each classifier in the sequence is responsible for predicting the relevance of one label. When training the classifier for a label, proceeding labels will be taken as extended features. If the extended features are highly correlated to the label, the performance will be improved, otherwise, the performance will not be influenced or even degraded. How to discover label correlation and determine the label order is critical for CC approach. This paper employs Bayesian network (BN) to model the label correlations and proposes a new BN-based CC method (BNCC). First, conditional entropy is used to describe the dependency relations among labels. Then, a BN is built up by taking nodes as labels and weights of edges as their dependency relations. A new scoring function is proposed to evaluate a BN structure, and a heuristic algorithm is introduced to optimize the BN. At last, by applying topological sorting on the nodes of the optimized BN, the label order for constructing CC model is derived. Experimental comparisons demonstrate the feasibility and effectiveness of the proposed method.


Bayesian Incremental Inference Update by Re-using Calculations from Belief Space Planning: A New Paradigm

arXiv.org Artificial Intelligence

Inference and decision making under uncertainty are key processes in every autonomous system and numerous robotic problems. In recent years, the similarities between inference and decision making triggered much work, from developing unified computational frameworks to pondering about the duality between the two. In spite of these efforts, inference and control, as well as inference and belief space planning (BSP) are still treated as two separate processes. In this paper we propose a paradigm shift, a novel approach which deviates from conventional Bayesian inference and utilizes the similarities between inference and BSP. We make the key observation that inference can be efficiently updated using predictions made during the decision making stage, even in light of inconsistent data association between the two. We developed a two staged process that implements our novel approach and updates inference using calculations from the precursory planning phase. Using autonomous navigation in an unknown environment along with iSAM2 efficient methodologies as a test case, we benchmarked our novel approach against standard Bayesian inference, both with synthetic and real-world data (KITTI dataset). Results indicate that not only our approach improves running time by at least a factor of two while providing the same estimation accuracy, but it also alleviates the computational burden of state dimensionality and loop closures.


A Deep Learning Approach for Tweet Classification and Rescue Scheduling for Effective Disaster Management

arXiv.org Machine Learning

It is a challenging and complex task to acquire information from different regions of a disaster-affected area in a timely fashion. The extensive spread and reach of social media and networks allow people to share information in real-time. However, the processing of social media data and gathering of valuable information require a series of operations such as (1) processing each specific tweet for a text classification, (2) possible location determination of people needing help based on tweets, and (3) priority calculations of rescue tasks based on the classification of tweets. These are three primary challenges in developing an effective rescue scheduling operation using social media data. In this paper, first, we propose a deep learning model combining attention based Bi-directional Long Short-Term Memory (BLSTM) and Convolutional Neural Network (CNN) to classify the tweets under different categories. We use pre-trained crisis word vectors and global vectors for word representation (GLoVe) for capturing semantic meaning from tweets. Next, we perform feature engineering to create an auxiliary feature map which dramatically increases the model accuracy. In our experiments using real data sets from Hurricanes Harvey and Irma, it is observed that our proposed approach performs better compared to other classification methods based on Precision, Recall, F1-score, and Accuracy, and is highly effective to determine the correct priority of a tweet. Furthermore, to evaluate the effectiveness and robustness of the proposed classification model a merged dataset comprises of 4 different datasets from CrisisNLP and another 15 different disasters data from CrisisLex are used. Finally, we develop an adaptive multitask hybrid scheduling algorithm considering resource constraints to perform an effective rescue scheduling operation considering different rescue priorities.


Dueling Posterior Sampling for Preference-Based Reinforcement Learning

arXiv.org Artificial Intelligence

In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in RL, we present Dueling Posterior Sampling (DPS), which employs preference-based posterior sampling to learn both the system dynamics and the underlying utility function that governs the user's preferences. Because preference feedback is provided on trajectories rather than individual state/action pairs, we develop a Bayesian approach to solving the credit assignment problem, translating user preferences to a posterior distribution over state/action reward models. We prove an asymptotic no-regret rate for DPS with a Bayesian logistic regression credit assignment model; to our knowledge, this is the first regret guarantee for preference-based RL. We also discuss possible avenues for extending this proof methodology to analyze other credit assignment models. Finally, we evaluate the approach empirically, showing competitive performance against existing baselines.


The Flawed Reasoning Behind the Replication Crisis - Issue 74: Networks

Nautilus

Suppose we scan 1 million similar women, and we tell everyone who tests positive that they have cancer. Then we will have correctly told all 10,000 women with cancer that they have it. Of the remaining 990,000 women whose lumps were benign, we will incorrectly tell 49,500 women that they have cancer. Therefore, of the women we identify as having cancer, about 83 percent will have been incorrectly diagnosed. Imagine you or a loved one received a positive test result.


Ensemble Neural Networks (ENN): A gradient-free stochastic method

arXiv.org Machine Learning

Abstract: In this study, an efficient stochastic gradient - free method, the ensemble neural networks (ENN), is developed. In the ENN, the optimization process relies on covariance matrices rather than derivatives. The covariance matrices are calculated by the ensemb le randomized maximum likelihood algorithm (EnRML), which is an inverse modeling method. The ENN is able to simultaneously provide estimations and perform uncertainty quantification since it is built under the Bayesian framework. The ENN is also robust to small training data size because the ensemble of stochastic realizations essentially enlarges the training dataset. This constitutes a desirable characteristic, especially for real - world engineering applications. In addition, the ENN does not require the c alculation of gradients, which enables the use of complicated neuron models and loss functions in neural networks. We experimentally demonstrate benefits of the proposed model, in particular showing that the ENN performs much better than the traditional Ba yesian neural networks (BNN). The EnRML in ENN is a substitution of gradient - based optimization algorithms, which means that it can be directly combined with the feed - forward process in other existing (deep) neural networks, such as convolutional neural ne tworks (CNN) and recurrent neural networks (RNN), broadening future applications of the ENN. Keywords: Inverse modeling, Gradient - free, Uncertainty quantification, Robust to small d ata size, Stochastic method 1. Introduction Artificial neural networks (ANN) are computing systems inspired by biological neural networks that constitute animal brains. ANN is capable of approximating nonlinear functional relationships between input and output variables (Kim et al., 2018). From a ma thematical perspective, a neural network can model any function up to any given precision with a sufficiently large number of basis functions (Cybenko, 1989; Hornik, 1991). In addition, we can even use much smaller models by constructing hierarchy neural n etworks (Delalleau & Bengio, 2011; Gal, 2016). The basic processing elements of neural networks are neurons. A collection of neurons is referred to as a layer, and the collection of interconnected layers forms the neural networks (Kim et al., 2018). A four - layer neural network is illustrated in Figure 1 as an example. In a neuron, the output is calculated by a nonlinear function of the sum of its inputs. The connections between different neurons from adjacent layers are represented by the weights in a model. The weights adjust as learning proceeds, and they represent the strength of the signal at a connection. The nonlinear function is also called the activation function, and the most popular choices are sigmoid, tansig, and ReLU (Li et al., 2015). 2 ANN has bee n widely applied to solving real - world engineering problems, and the following three topics are significant for effective applications .


A Hierarchical Bayesian Model for Size Recommendation in Fashion

arXiv.org Machine Learning

We introduce a hierarchical Bayesian approach to tackle the challenging problem of size recommendation in e-commerce fashion. Our approach jointly models a size purchased by a customer, and its possible return event: 1. no return, 2. returned too small 3. returned too big. Those events are drawn following a multinomial distribution parameterized on the joint probability of each event, built following a hierarchy combining priors. Such a model allows us to incorporate extended domain expertise and article characteristics as prior knowledge, which in turn makes it possible for the underlying parameters to emerge thanks to sufficient data. Experiments are presented on real (anonymized) data from millions of customers along with a detailed discussion on the efficiency of such an approach within a large scale production system.


Inferring linear and nonlinear Interaction networks using neighborhood support vector machines

arXiv.org Machine Learning

In this paper, we consider modelling interaction between a set of variables in the context of time series and high dimension. We suggest two approaches. The first is similar to the neighborhood lasso when the lasso model is replaced by a support vector machine (SVMs). The second is a restricted Bayesian network adapted for time series. We show the efficiency of our approaches by simulations using linear, nonlinear data set and a mixture of both.


Probabilistic Residual Learning for Aleatoric Uncertainty in Image Restoration

arXiv.org Machine Learning

Aleatoric uncertainty is an intrinsic property of ill-posed inverse and imaging problems. Its quantification is vital for assessing the reliability of relevant point estimates. In this paper, we propose an efficient framework for quantifying aleatoric uncertainty for deep residual learning and showcase its significant potential on image restoration. In the framework, we divide the conditional probability modeling for the residual variable into a deterministic homo-dimensional level, a stochastic low-dimensional level and a merging level. The low-dimensionality is especially suitable for sparse correlation between image pixels, enables efficient sampling for high dimensional problems and acts as a regularizer for the distribution. Preliminary numerical experiments show that the proposed method can give not only state-of-the-art point estimates of image restoration but also useful associated uncertainty information.


Uncertainty Quantification in Deep Learning for Safer Neuroimage Enhancement

arXiv.org Machine Learning

Deep learning (DL) has shown great potential in medical image enhancement problems, such as super-resolution or image synthesis. However, to date, little consideration has been given to uncertainty quantification over the output image. Here we introduce methods to characterise different components of uncertainty in such problems and demonstrate the ideas using diffusion MRI super-resolution. Specifically, we propose to account for $intrinsic$ uncertainty through a heteroscedastic noise model and for $parameter$ uncertainty through approximate Bayesian inference, and integrate the two to quantify $predictive$ uncertainty over the output image. Moreover, we introduce a method to propagate the predictive uncertainty on a multi-channelled image to derived scalar parameters, and separately quantify the effects of intrinsic and parameter uncertainty therein. The methods are evaluated for super-resolution of two different signal representations of diffusion MR images---DTIs and Mean Apparent Propagator MRI---and their derived quantities such as MD and FA, on multiple datasets of both healthy and pathological human brains. Results highlight three key benefits of uncertainty modelling for improving the safety of DL-based image enhancement systems. Firstly, incorporating uncertainty improves the predictive performance even when test data departs from training data. Secondly, the predictive uncertainty highly correlates with errors, and is therefore capable of detecting predictive "failures". Results demonstrate that such an uncertainty measure enables subject-specific and voxel-wise risk assessment of the output images. Thirdly, we show that the method for decomposing predictive uncertainty into its independent sources provides high-level "explanations" for the performance by quantifying how much uncertainty arises from the inherent difficulty of the task or the limited training examples.