Directed Networks
FRAME Revisited: An Interpretation View Based on Particle Evolution
Cai, Xu, Wu, Yang, Li, Guanbin, Chen, Ziliang, Lin, Liang
FRAME (Filters, Random fields, And Maximum Entropy) is an energy-based descriptive model that synthesizes visual realism by capturing mutual patterns from structural input signals. The maximum likelihood estimation (MLE) is applied by default, yet conventionally causes the unstable training energy that wrecks the generated structures, which remains unexplained. In this paper, we provide a new theoretical insight to analyze FRAME, from a perspective of particle physics ascribing the weird phenomenon to KL-vanishing issue. In order to stabilize the energy dissipation, we propose an alternative Wasserstein distance in discrete time based on the conclusion that the Jordan-Kinderlehrer-Otto (JKO) discrete flow approximates KL discrete flow when the time step size tends to 0. Besides, this metric can still maintain the model's statistical consistency. Quantitative and qualitative experiments have been respectively conducted on several widely used datasets. The empirical studies have evidenced the effectiveness and superiority of our method.
Designing quantum experiments with a genetic algorithm
Nichols, Rosanna, Mineh, Lana, Rubio, Jesรบs, Matthews, Jonathan C. F., Knott, Paul A.
We introduce a genetic algorithm that designs quantum optics experiments for engineering quantum states with specific properties. Our algorithm is powerful and flexible, and can easily be modified to find methods of engineering states for a range of applications. Here we focus on quantum metrology. First, we consider the noise-free case, and use the algorithm to find quantum states with a large quantum Fisher information (QFI). We find methods, which only involve experimental elements that are available with current technology, for engineering quantum states with up to a 100-fold improvement over the best classical state, and a 20-fold improvement over the optimal Gaussian state. Such states are a superposition of the vacuum with a large number of photons (around 80), and can hence be seen as Schr\"odinger-cat-like states. We then apply the two most dominant noise sources in our setting -- photon loss and imperfect heralding -- and use the algorithm to find quantum states that still improve over the optimal Gaussian state with realistic levels of noise. This will open up experimental and technological work in using exotic non-Gaussian states for quantum-enhanced phase measurements. Finally, we use the Bayesian mean square error to look beyond the regime of validity of the QFI, finding quantum states with precision enhancements over the alternatives even when the experiment operates in the regime of limited data.
That's Mine! Learning Ownership Relations and Norms for Robots
Tan, Zong Xuan, Brawer, Jake, Scassellati, Brian
The ability for autonomous agents to learn and conform to human norms is crucial for their safety and effectiveness in social environments. While recent work has led to frameworks for the representation and inference of simple social rules, research into norm learning remains at an exploratory stage. Here, we present a robotic system capable of representing, learning, and inferring ownership relations and norms. Ownership is represented as a graph of probabilistic relations between objects and their owners, along with a database of predicate-based norms that constrain the actions permissible on owned objects. To learn these norms and relations, our system integrates (i) a novel incremental norm learning algorithm capable of both one-shot learning and induction from specific examples, (ii) Bayesian inference of ownership relations in response to apparent rule violations, and (iii) percept-based prediction of an object's likely owners. Through a series of simulated and real-world experiments, we demonstrate the competence and flexibility of the system in performing object manipulation tasks that require a variety of norms to be followed, laying the groundwork for future research into the acquisition and application of social norms.
Network Compression via Recursive Bayesian Pruning
Zhou, Yuefu, Zhang, Ya, Wang, Yanfeng, Tian, Qi
Recently, compression and acceleration of deep neural networks are in critic need. Bayesian generalization of structured pruning represents an important research direction to solve the above problem. However, the existing Bayesian methods ignore the dependency among neurons and filters for computational simplicity. In this study, we explore, under Bayesian framework, a structured pruning method with layer-wise sequential dependency assumed, a more general learning setting. Based on the property of Dirac distribution, we further derive a new dropout noise, which makes it possible to approximate the posterior of dropout noise knowing that of the previous layer. With the Dirac-like dropout noise, we further propose a recursive strategy, named \emph{Recursive Bayesian Pruning} (RBP), to train and prune networks in a layer-by-layer fashion. The unimportant neurons and filters are directly targeted and removed, taking the influence from the previous layer. Experiments on typical neural networks LeNet-300-100, LeNet-5 and VGG-16 have demonstrated the proposed method are competitive with or even outperform the state-of-the-art methods in several compression and acceleration metrics.
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
Bach, Stephen H., Rodriguez, Daniel, Liu, Yintao, Luo, Chong, Shao, Haidong, Xia, Cassandra, Sen, Souvik, Ratner, Alexander, Hancock, Braden, Alborzi, Houman, Kuchhal, Rahul, Rรฉ, Christopher, Malkin, Rob
Labeling training data is one of the most costly bottlenecks in developing or modifying machine learning-based applications. We survey how resources from across an organization can be used as weak supervision sources for three classification tasks at Google, in order to bring development time and cost down by an order of magnitude. We build on the Snorkel framework, extending it as a new system, Snorkel DryBell, which integrates with Google's distributed production systems and enables engineers to develop and execute weak supervision strategies over millions of examples in less than thirty minutes. We find that Snorkel DryBell creates classifiers of comparable quality to ones trained using up to tens of thousands of hand-labeled examples, in part by leveraging organizational resources not servable in production which contribute an average 52% performance improvement to the weakly supervised classifiers.
Generalization in anti-causal learning
Kilbertus, Niki, Parascandolo, Giambattista, Schรถlkopf, Bernhard
The ability to learn and act in novel situations is still a prerogative of animate intelligence, as current machine learning methods mostly fail when moving beyond the standard i.i.d. setting. What is the reason for this discrepancy? Most machine learning tasks are anti-causal, i.e., we infer causes (labels) from effects (observations). Typically, in supervised learning we build systems that try to directly invert causal mechanisms. Instead, in this paper we argue that strong generalization capabilities crucially hinge on searching and validating meaningful hypotheses, requiring access to a causal model. In such a framework, we want to find a cause that leads to the observed effect. Anti-causal models are used to drive this search, but a causal model is required for validation. We investigate the fundamental differences between causal and anti-causal tasks, discuss implications for topics ranging from adversarial attacks to disentangling factors of variation, and provide extensive evidence from the literature to substantiate our view. We advocate for incorporating causal models in supervised learning to shift the paradigm from inference only, to search and validation.
Knowledge-driven generative subspaces for modeling multi-view dependencies in medical data
Pillai, Parvathy Sudhir, Leong, Tze-Yun
Early detection of Alzheimer's disease (AD) and identification of potential risk/beneficial factors are important for planning and administering timely interventions or preventive measures. In this paper, we learn a disease model for AD that combines genotypic and phenotypic profiles, and cognitive health metrics of patients. We propose a probabilistic generative subspace that describes the correlative, complementary and domain-specific semantics of the dependencies in multi-view, multi-modality medical data. Guided by domain knowledge and using the latent consensus between abstractions of multi-view data, we model the fusion as a data generating process. We show that our approach can potentially lead to i) explainable clinical predictions and ii) improved AD diagnoses.
GAN-EM: GAN based EM learning framework
Zhao, Wentian, Wang, Shaojie, Xie, Zhihuai, Shi, Jing, Xu, Chenliang
Expectation maximization (EM) algorithm is to find maximum likelihood solution for models having latent variables. A typical example is Gaussian Mixture Model (GMM) which requires Gaussian assumption, however, natural images are highly non-Gaussian so that GMM cannot be applied to perform clustering task on pixel space. To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of L-Lipschitz continuity. We call this model GAN-EM, which is a framework for image clustering, semi-supervised classification and dimensionality reduction. In M-step, we design a novel loss function for discriminator of GAN to perform maximum likelihood estimation (MLE) on data with soft class label assignments. Specifically, a conditional generator captures data distribution for $K$ classes, and a discriminator tells whether a sample is real or fake for each class. Since our model is unsupervised, the class label of real data is regarded as latent variable, which is estimated by an additional network (E-net) in E-step. The proposed GAN-EM achieves state-of-the-art clustering and semi-supervised classification results on MNIST, SVHN and CelebA, as well as comparable quality of generated images to other recently developed generative models.
Verifying Fairness Properties via Concentration
Bastani, Osbert, Zhang, Xin, Solar-Lezama, Armando
As machine learning systems are increasingly used to make real world legal and financial decisions, it is of paramount importance that we develop algorithms to verify that these systems do not discriminate against minorities. We design a scalable algorithm for verifying fairness specifications. Our algorithm obtains strong correctness guarantees based on adaptive concentration inequalities; such inequalities enable our algorithm to adaptively take samples until it has enough data to make a decision. We implement our algorithm in a tool called VeriFair, and show that it scales to large machine learning models, including a deep recurrent neural network that is more than five orders of magnitude larger than the largest previously-verified neural network. While our technique only gives probabilistic guarantees due to the use of random samples, we show that we can choose the probability of error to be extremely small.
Efficiency and robustness in Monte Carlo sampling of 3-D geophysical inversions with Obsidian v0.1.2: Setting up for success
Scalzo, Richard, Kohn, David, Olierook, Hugo, Houseman, Gregory, Chandra, Rohitash, Girolami, Mark, Cripps, Sally
The rigorous quantification of uncertainty in geophysical inversions is a challenging problem. Inversions are often ill-posed and the likelihood surface may be multi-modal; properties of any single mode become inadequate uncertainty measures, and sampling methods become inefficient for irregular posteriors or high-dimensional parameter spaces. We explore the influences of different choices made by the practitioner on the efficiency and accuracy of Bayesian geophysical inversion methods that rely on Markov chain Monte Carlo sampling to assess uncertainty, using a multi-sensor inversion of the three-dimensional structure and composition of a region in the Cooper Basin of South Australia as a case study. The inversion is performed using an updated version of the Obsidian distributed inversion software. We find that the posterior for this inversion has complex local covariance structure, hindering the efficiency of adaptive sampling methods that adjust the proposal based on the chain history. Within the context of a parallel-tempered Markov chain Monte Carlo scheme for exploring high-dimensional multi-modal posteriors, a preconditioned Crank-Nicholson proposal outperforms more conventional forms of random walk. Aspects of the problem setup, such as priors on petrophysics or on 3-D geological structure, affect the shape and separation of posterior modes, influencing sampling performance as well as the inversion results. Use of uninformative priors on sensor noise can improve inversion results by enabling optimal weighting among multiple sensors even if noise levels are uncertain. Efficiency could be further increased by using posterior gradient information within proposals, which Obsidian does not currently support, but which could be emulated using posterior surrogates.