Uncertainty
Is strong AI inevitable? Look for yourself! – Towards Data Science
Any question about the future is susceptible to unknown unknowns, futile speculations about things undiscovered. We need examples we can observe and interrogate now. While we don't have strong AI, we do have rigorous examples of weak and strong intelligence. A comparison of the nature of knowledge in its weak and strong forms offers a penetrating and non-technical look into the prospects for strong AI. Our last stop on this tour of the AI landscape introduced induction as the prevailing theory of knowledge creation, and the central role that explanations play in workable inductive systems. Here, I'll apply that framework to shed some light on one of the most contentious and important debates in AI: Are we on the path to artificial general intelligence? Is tomorrow's strong AI the inevitable extension of today's weaker examples? Here's the plan: We'll examine two points along the knowledge hierarchy, one associated with weak AI, the other a much stronger form. I've labelled these points predictions and explanations, respectively, and I'll make these terms more precise as we go. Through concrete examples, you can evaluate the quality of each intelligence yourself, and decide whether the path from weak to strong seems smooth and incremental, or perilous and disjoint.
Latent heterogeneous multilayer community detection
Ali, Hafiz Tiomoko, Liu, Sijia, Yilmaz, Yasin, Hero, Alfred, Couillet, Romain, Rajapakse, Indika
We propose a method for simultaneously detecting shared and unshared communities in heterogeneous multilayer weighted and undirected networks. The multilayer network is assumed to follow a generative probabilistic model that takes into account the similarities and dissimilarities between the communities. We make use of a variational Bayes approach for jointly inferring the shared and unshared hidden communities from multilayer network observations. We show the robustness of our approach compared to state-of-the art algorithms in detecting disparate (shared and private) communities on synthetic data as well as on real genome-wide fibroblast proliferation dataset.
Binary Classification in Unstructured Space With Hypergraph Case-Based Reasoning
Binary classification is one of the most common problem in machine learning. It consists in predicting whether a given element is of a particular class. In this paper, a new algorithm for binary classification is proposed using a hypergraph representation. Each element to be classified is partitioned according to its interactions with the training set. For each class, the total support is calculated as a convex combination of the {\it evidence} strength of the element of the partition. The evidence measure is pre-computed using the hypergraph induced by the training set and iteratively adjusted through a training phase. It does not require structured information, each case being represented by a set of {\it agnostic information} atoms. Empirical validation demonstrates its high potential on a wide range of well-known datasets and the results are compared to the state-of-art. The time complexity is given and empirically validated. Its capacity to provide good performances without hyperparameter tuning compared to standard classification methods is studied. Finally, the limitation of the model space is discussed and some potential solutions proposed.
How I Learned to Stop Worrying and Love Uncertainty
Since their early days, humans have had an important, often antagonistic relationship with uncertainty; we try to kill it everywhere we find it. Without an explanation for many natural phenomena, humans invented gods to explain them, and without certainty of the future, they consulted oracles. It was precisely the oracle's role to reduce uncertainty for their fellow humans, predicting their future and giving counsel according to their gods' will, and even though their accuracy left much to be desired, they were believed, for any measure of certainty is better than none. As society grew sophisticated, oracles were (not completely) displaced by empiric thought, which proved much more successful at prediction and counsel. Empiricism itself evolved into the collection of techniques we call the scientific method, which has proven to be much more effective at reducing uncertainty, and is modern society's most trustworthy way of producing predictions.
Minibatch Gibbs Sampling on Large Graphical Models
De Sa, Christopher, Chen, Vincent, Wong, Wing
Gibbs sampling is a Markov chain Monte Carlo method that is one of the most widespread techniques used with graphical models [7]. Gibbs sampling is an iterative method that repeatedly resamples a variable in the model from its conditional distribution, a process that is guaranteed to converge asymptotically to the desired distribution. Since these updates are typically simple and fast to run, Gibbs sampling can be applied to a variety of problems, and has been used for inference on large-scale graphical models in many systems [11, 13, 14, 19, 20, 21]. Unfortunately, for large graphical models with many factors, the computational cost of running an iteration of Gibbs sampling can become prohibitive. Even though Gibbs sampling is a graph-local algorithm, in the sense that each update only needs to reference data associated with a local neighborhood of the factor graph, as graphs become large and highly connected, even these local neighborhoods can become huge.
Supervised learning with generalized tensor networks
Glasser, Ivan, Pancotti, Nicola, Cirac, J. Ignacio
Tensor networks have found a wide use in a variety of applications in physics and computer science, recently leading to both theoretical insights as well as practical algorithms in machine learning. In this work we explore the connection between tensor networks and probabilistic graphical models, and show that it motivates the definition of generalized tensor networks where information from a tensor can be copied and reused in other parts of the network. We discuss the relationship between generalized tensor network architectures used in quantum physics, such as String-Bond States and Entangled Plaquette States, and architectures commonly used in machine learning. We provide an algorithm to train these networks in a supervised learning context and show that they overcome the limitations of regular tensor networks in higher dimensions, while keeping the computation efficient. A method to combine neural networks and tensor networks as part of a common deep learning architecture is also introduced. We benchmark our algorithm for several generalized tensor network architectures on the task of classifying images and sounds, and show that they outperform previously introduced tensor network algorithms. Some of the models we consider can be realized on a quantum computer and may guide the development of near-term quantum machine learning architectures.
An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method
Joseph, Ajin George, Bhatnagar, Shalabh
In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy (CE) optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.
Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model
Andrade, Daniel, Takeda, Akiko, Fukumizu, Kenji
Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to noise can dramatically change the clustering result when evaluating for example with the Bayesian Information Criteria (BIC). In this work, we try to address this issue by proposing a Bayesian model that accounts for negligible small, but not necessarily zero, partial correlations. Based on our model, we propose to evaluate a variable clustering result using the marginal likelihood. To address the intractable calculation of the marginal likelihood, we propose two solutions: one based on a variational approximation, and another based on MCMC. Experiments on simulated data shows that the proposed method is similarly accurate as BIC in the no noise setting, but considerably more accurate when there are noisy partial correlations. Furthermore, on real data the proposed method provides clustering results that are intuitively sensible, which is not always the case when using BIC or its extensions.
Parameter Learning and Change Detection Using a Particle Filter With Accelerated Adaptation
Indeed, the bulk of the empirical academic literature in finance takes this approach. However, practitioners' use of models, in particular for the pricing and risk management of derivative financial products relative to observed prices for liquidly traded market instruments, typically tends to depart from this ideal. Primacy is accorded to model "calibration" over empirical consistency, i.e., choosing a set of liquidly traded market instruments (which may include liquidly traded derivatives) as "calibration instruments", model parameters are determined so as to match model prices of these instruments as closely as possible to observed market prices at a given point in time. Once these market prices have changed, the model parameters (which were assumed to be constant, or at most time-varying in a known deterministic fashion) are recalibrated, thereby contradicting the model assumptions. "Legalising" these parameter changes by expanding the state space (e.g. via regime-switching or stochastic volatility models) shifts, rather than resolves, the problem: for example in the case of stochastic volatility, volatility becomes a state variable rather than a model parameter, and can evolve stochastically, but the parameters of the stochastic volatility process itself are assumed to be time-invariant.
Monaural source enhancement maximizing source-to-distortion ratio via automatic differentiation
Nakajima, Hiroaki, Takahashi, Yu, Kondo, Kazunobu, Hisaminato, Yuji
Recently, deep neural network (DNN) has made a breakthrough in monaural source enhancement. Through a training step by using a large amount of data, DNN estimates a mapping between mixed signals and clean signals. At this time, we use an objective function that numerically expresses the quality of a mapping by DNN. In the conventional methods, L1 norm, L2 norm, and Itakura-Saito divergence are often used as objective functions. Recently, an objective function based on short-time objective intelligibility (STOI) has also been proposed. However, these functions only indicate similarity between the clean signal and the estimated signal by DNN. In other words, they do not show the quality of noise reduction or source enhancement. Motivated by the fact, this paper adopts signal-to-distortion ratio (SDR) as the objective function. Since SDR virtually shows signal-to-noise ratio (SNR), maximizing SDR solves the above problem. The experimental results revealed that the proposed method achieved better performance than the conventional methods.