Directed Networks
A Particle Filter based Multi-Objective Optimization Algorithm: PFOPS
This letter is concerned with a recently developed paradigm of population-based optimization, termed particle filter optimization (PFO). In contrast with the commonly used meta-heuristics based methods, the PFO paradigm is attractive in terms of coherence in theory and easiness in mathematical analysis and interpretation. However, current PFO algorithms only work for single-objective optimization cases, while many real-life problems involve multiple objectives to be optimized simultaneously. To this end, we make an effort to extend the scope of application of the PFO paradigm to multi-objective optimization (MOO) cases. An idea called path sampling is adopted within the PFO scheme to balance the different objectives to be optimized. The resulting algorithm is thus termed PFO with Path Sampling (PFOPS). Experimental results show that the proposed algorithm works consistently well for three different types of MOO problems, which are characterized by an associated convex, concave and discontinuous Pareto front, respectively.
The Bayesian Probability: Basis and Particular Utility in AI
PROBABILITY was initially called and for a quite a long time the doctrine of chances and was the mathematical description of game of chance (dice, cards and so on) and used to describe and quantify randomness or aleatory of uncertainty. Statisticians use it to describe uncertainty. How can you use probability to describe learning? How can you use it to describe an accumulation of information overtime so yo can modify probability, based on additional knowledge? However, using Bayes theorem is a thing and being Bayesian is something else.
Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study
Siddhant, Aditya, Lipton, Zachary C.
Several recent papers investigate Active Learning (AL) for mitigating the data dependence of deep learning for natural language processing. However, the applicability of AL to real-world problems remains an open question. While in supervised learning, practitioners can try many different methods, evaluating each against a validation set before selecting a model, AL affords no such luxury. Over the course of one AL run, an agent annotates its dataset exhausting its labeling budget. Thus, given a new task, an active learner has no opportunity to compare models and acquisition functions. This paper provides a large scale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions. We find that across all settings, Bayesian active learning by disagreement, using uncertainty estimates provided either by Dropout or Bayes-by Backprop significantly improves over i.i.d. baselines and usually outperforms classic uncertainty sampling.
Adaptive Structural Learning of Deep Belief Network for Medical Examination Data and Its Knowledge Extraction by using C4.5
Kamada, Shin, Ichimura, Takumi, Harada, Toshihide
Deep Learning has a hierarchical network architecture to represent the complicated feature of input patterns. The adaptive structural learning method of Deep Belief Network (DBN) has been developed. The method can discover an optimal number of hidden neurons for given input data in a Restricted Boltzmann Machine (RBM) by neuron generation-annihilation algorithm, and generate a new hidden layer in DBN by the extension of the algorithm. In this paper, the proposed adaptive structural learning of DBN was applied to the comprehensive medical examination data for the cancer prediction. The prediction system shows higher classification accuracy (99.8% for training and 95.5% for test) than the traditional DBN. Moreover, the explicit knowledge with respect to the relation between input and output patterns was extracted from the trained DBN network by C4.5. Some characteristics extracted in the form of IF-THEN rules to find an initial cancer at the early stage were reported in this paper.
Task adapted reconstruction for inverse problems
Adler, Jonas, Lunz, Sebastian, Verdier, Olivier, Schönlieb, Carola-Bibiane, Öktem, Ozan
The paper considers the problem of performing a task defined on a model parameter that is only observed indirectly through noisy data in an ill-posed inverse problem. A key aspect is to formalize the steps of reconstruction and task as appropriate estimators (non-randomized decision rules) in statistical estimation problems. The implementation makes use of (deep) neural networks to provide a differentiable parametrization of the family of estimators for both steps. These networks are combined and jointly trained against suitable supervised training data in order to minimize a joint differentiable loss function, resulting in an end-to-end task adapted reconstruction method. The suggested framework is generic, yet adaptable, with a plug-and-play structure for adjusting both the inverse problem and the task at hand. More precisely, the data model (forward operator and statistical model of the noise) associated with the inverse problem is exchangeable, e.g., by using neural network architecture given by a learned iterative method. Furthermore, any task that is encodable as a trainable neural network can be used. The approach is demonstrated on joint tomographic image reconstruction, classification and joint tomographic image reconstruction segmentation.
Water Disaggregation via Shape Features based Bayesian Discriminative Sparse Coding
Wang, Bingsheng, Zhang, Xuchao, Lu, Chang-Tien, Chen, Feng
As the issue of freshwater shortage is increasing daily, it is critical to take effective measures for water conservation. According to previous studies, device level consumption could lead to significant freshwater conservation. Existing water disaggregation methods focus on learning the signatures for appliances; however, they are lack of the mechanism to accurately discriminate parallel appliances' consumption. In this paper, we propose a Bayesian Discriminative Sparse Coding model using Laplace Prior (BDSC-LP) to extensively enhance the disaggregation performance. To derive discriminative basis functions, shape features are presented to describe the low-sampling-rate water consumption patterns. A Gibbs sampling based inference method is designed to extend the discriminative capability of the disaggregation dictionaries. Extensive experiments were performed to validate the effectiveness of the proposed model using both real-world and synthetic datasets.
An Intersectional Definition of Fairness
With the rising influence of machine learning algorithms on many important aspects of our daily lives, there are growing concerns that biases inherent in data can lead the behavior of these algorithms to discriminate against certain populations [1, 2, 4, 6, 8, 28, 29, 15]. In recent years, substantial research effort has been devoted to the development of mathematical definitions of bias, or its opposite, fairness, in algorithms and in data [15, 18, 26, 23, 19, 32]. In this work, we focus on the fairness scenario where there are multiple protected attributes that we aim to ensure fairness for, and which may potentially overlap with each other, such as gender, race, and sexual orientation. Our guiding principle is intersectionality, the core theoretical framework underlying the thirdwave feminist movement [13]. The principle of intersectionality states that racism, sexism, and other social systems which harm marginalized groups are interlocking in their effects, such that the lived experience of, e.g., black women, is very different than that of, e.g., white women. Intersectionality was defined by Kimberlé Crenshaw in the 1980's [13] and popularized in the 1990's, e.g. by Patricia Hill Collins [10], although the ideas are much older [11, 35]. In the context of machine learning and fairness, intersectionality was recently considered by [7], who studied the impact of the intersection of gender and skin color on computer vision performance, and by [23, 19], who aimed to protect certain subgroups in order to prevent "fairness gerrymandering."
Probabilistic Graphical Modeling approach to dynamic PET direct parametric map estimation and image reconstruction
Scipioni, Michele, Pedemonte, Stefano, Santarelli, Maria Filomena, Landini, Luigi
In the context of dynamic emission tomography, the conventional processing pipeline consists of independent image reconstruction of single time frames, followed by the application of a suitable kinetic model to time activity curves (TACs) at the voxel or region-of-interest level. The relatively new field of 4D PET direct reconstruction, by contrast, seeks to move beyond this scheme and incorporate information from multiple time frames within the reconstruction task. Existing 4D direct models are based on a deterministic description of voxels' TACs, captured by the chosen kinetic model, considering the photon counting process the only source of uncertainty. In this work, we introduce a new probabilistic modeling strategy based on the key assumption that activity time course would be subject to uncertainty even if the parameters of the underlying dynamic process were known. This leads to a hierarchical Bayesian model, which we formulate using the formalism of Probabilistic Graphical Modeling (PGM). The inference of the joint probability density function arising from PGM is addressed using a new gradient-based iterative algorithm, which presents several advantages compared to existing direct methods: it is flexible to an arbitrary choice of linear and nonlinear kinetic model; it enables the inclusion of arbitrary (sub)differentiable priors for parametric maps; it is simpler to implement and suitable to integration in computing frameworks for machine learning. Computer simulations and an application to real patient scan showed how the proposed approach allows us to weight the importance of the kinetic model, providing a bridge between indirect and deterministic direct methods.
Unknown Examples & Machine Learning Model Generalization
Chung, Yeounoh, Haas, Peter J., Upfal, Eli, Kraska, Tim
Over the past decades, researchers and ML practitioners have come up with better and better ways to build, understand and improve the quality of ML models, but mostly under the key assumption that the training data is distributed identically to the testing data. In many real-world applications, however, some potential training examples are unknown to the modeler, due to sample selection bias or, more generally, covariate shift, i.e., a distribution shift between the training and deployment stage. The resulting discrepancy between training and testing distributions leads to poor generalization performance of the ML model and hence biased predictions. We provide novel algorithms that estimate the number and properties of these unknown training examples---unknown unknowns. This information can then be used to correct the training set, prior to seeing any test data. The key idea is to combine species-estimation techniques with data-driven methods for estimating the feature values for the unknown unknowns. Experiments on a variety of ML models and datasets indicate that taking the unknown examples into account can yield a more robust ML model that generalizes better.
Analysis of Noise Contrastive Estimation from the Perspective of Asymptotic Variance
Uehara, Masatoshi, Matsuda, Takeru, Komaki, Fumiyasu
There are many models, often called unnormalized models, whose normalizing constants are not calculated in closed form. Maximum likelihood estimation is not directly applicable to unnormalized models. Score matching, contrastive divergence method, pseudo-likelihood, Monte Carlo maximum likelihood, and noise contrastive estimation (NCE) are popular methods for estimating parameters of such models. In this paper, we focus on NCE. The estimator derived from NCE is consistent and asymptotically normal because it is an M-estimator. NCE characteristically uses an auxiliary distribution to calculate the normalizing constant in the same spirit of the importance sampling. In addition, there are several candidates as objective functions of NCE. We focus on how to reduce asymptotic variance. First, we propose a method for reducing asymptotic variance by estimating the parameters of the auxiliary distribution. Then, we determine the form of the objective functions, where the asymptotic variance takes the smallest values in the original estimator class and the proposed estimator classes. We further analyze the robustness of the estimator.