Mathematical & Statistical Methods
On solutions of the distributional Bellman equation
Gerstenberg, Julian, Neininger, Ralph, Spiegel, Denis
In distributional reinforcement learning not only expected returns but the complete return distributions of a policy are taken into account. The return distribution for a fixed policy is given as the solution of an associated distributional Bellman equation. In this note we consider general distributional Bellman equations and study existence and uniqueness of their solutions as well as tail properties of return distributions. We give necessary and sufficient conditions for existence and uniqueness of return distributions and identify cases of regular variation. We link distributional Bellman equations to multivariate affine distributional equations. We show that any solution of a distributional Bellman equation can be obtained as the vector of marginal laws of a solution to a multivariate affine distributional equation. This makes the general theory of such equations applicable to the distributional reinforcement learning setting.
Probability and Statistics for Business and Data Science
Welcome to Probability and Statistics for Business and Data Science! In this course we cover what you need to know about probability and statistics to succeed in business and the data science field! This practical course will go over theory and implementation of statistics to real world problems. Each section has example problems, in course quizzes, and assessment tests. We'll start by talking about the basics of data, understanding how to examine it with measurements of central tendency, dispersion, and also building an understanding of how bivariate data sources can relate to each other.
Linear transformations and matrices - Master Data Science
This post will be quite an interesting one. We will show how a 2D plane can be transformed into another one. Understanding these concepts is a crucial step for some more advanced linear algebra/machine learning methods (e.g. So, let's proceed and we will learn how to connect a matrix-vector multiplication with a linear transformation. In this post we will introduce a linear transformation. A linear transformation can also be seen as a simple function.
Graphon-aided Joint Estimation of Multiple Graphs
Navarro, Madeline, Segarra, Santiago
For instance, one would expect certain levels of similarities between the We consider the problem of estimating the topology of multiple networks brain networks of different healthy individuals or between the same from nodal observations, where these networks are assumed social network observed at different points in time. Prominent methods to be drawn from the same (unknown) random graph model. We for multiple network inference include statistical approaches, adopt a graphon as our random graph model, which is a nonparametric primarily consisting of the joint estimation of Gaussian graphical model from which graphs of potentially different sizes can models [13-17]. These methods typically involve modifications on be drawn. The versatility of graphons allows us to tackle the joint the graphical lasso formulation with additional encouragement of inference problem even for the cases where the graphs to be recovered structural similarity. Estimation of time-varying graphs is widely contain different number of nodes and lack precise alignment popular, as the relationship between graphs is typically straightforward across the graphs. Our solution is based on combining a maximum to implement by considering that graph variation is smooth likelihood penalty with graphon estimation schemes and can be used across time [18, 19]. The above methods for estimating multiple networks to augment existing network inference methods. We validate our typically enforce similar structure, such as promoting similar proposed approach by comparing its performance against competing sparsity patterns [20].
The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance
Faw, Matthew, Tziotis, Isidoros, Caramanis, Constantine, Mokhtari, Aryan, Shakkottai, Sanjay, Ward, Rachel
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following assumptions: (i) uniformly-bounded gradient norms, (ii) uniformly-bounded stochastic gradient variance (or even noise support), (iii) conditional independence between the step size and stochastic gradient. In this work, we show that AdaGrad-Norm exhibits an order optimal convergence rate of $\mathcal{O}\left(\frac{\mathrm{poly}\log(T)}{\sqrt{T}}\right)$ after $T$ iterations under the same assumptions as optimally-tuned non adaptive SGD (unbounded gradient norms and affine noise variance scaling), and crucially, without needing any tuning parameters. We thus establish that adaptive gradient methods exhibit order-optimal convergence in much broader regimes than previously understood.
Peng
Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost for big data. In this paper, we propose a new Bayesian approach, EigenGP, that learns both basis dictionary elements -- eigenfunctions of a GP prior -- and prior precisions in a sparse finite model. It is well known that, among all orthogonal basis functions, eigenfunctions can provide the most compact representation. Unlike other sparse Bayesian finite models where the basis function has a fixed form, our eigenfunctions live in a reproducing kernel Hilbert space as a finite linear combination of kernel functions. We learn the dictionary elements -- eigenfunctions -- and the prior precisions over these elements as well as all the other hyperparameters from data by maximizing the model marginal likelihood. We explore computational linear algebra to simplify the gradient computation significantly. Our experimental results demonstrate improved predictive performance of EigenGP over alternative sparse GP methods as well as relevance vector machines.
Piacentini
Compilation techniques in planning reformulate a problem into an alternative encoding for which efficient, off-the-shelf solvers are available. In this work, we present a novel mixed-integer linear programming (MILP) compilation for cost-optimal numeric planning with instantaneous actions. While recent works on the problem are restricted to actions that modify variables present in simple numeric conditions, our MILP formulation, in addition, handles linear conditions and linear action effects on numeric state variables. Such problems are particularly challenging due to the state-dependency of the action effects. Experiments show that our approach, in addition to being the state of the art for the more general problem class, is competitive with heuristic search-based planners on domains with only simple numeric conditions.
Azad
A live interactive narrative (LIN) is an experience where multiple players take on fictional roles and interact with real-world objects and actors to participate in a pre-authored narrative. Temporal properties of LINs are important to its viability and aesthetic quality and hence deserve special design consideration. In this paper, we tackle the largely overlooked problem of scheduling a multiplayer interactive narrative and propose the Live Interactive Narrative Scheduling Problem (LINSP), which handles reasoning under temporal uncertainty, resource scheduling, and non-linear plot choices. We present a mixed-integer linear programming formulation of the problem and empirically evaluates its scalability over large narrative instances.
Spectral embedding and the latent geometry of multipartite networks
Modell, Alexander, Gallagher, Ian, Cape, Joshua, Rubin-Delanchy, Patrick
Spectral embedding finds vector representations of the nodes of a network, based on the eigenvectors of its adjacency or Laplacian matrix, and has found applications throughout the sciences. Many such networks are multipartite, meaning their nodes can be divided into partitions and nodes of the same partition are never connected. When the network is multipartite, this paper demonstrates that the node representations obtained via spectral embedding live near partition-specific low-dimensional subspaces of a higher-dimensional ambient space. For this reason we propose a follow-on step after spectral embedding, to recover node representations in their intrinsic rather than ambient dimension, proving uniform consistency under a low-rank, inhomogeneous random graph model. Our method naturally generalizes bipartite spectral embedding, in which node representations are obtained by singular value decomposition of the biadjacency or bi-Laplacian matrix.
Probability and Statistics for Business and Data Science
Probability for improved business decisions: Introduction, Combinatorics, Bayesian Inference, Distributions. Welcome to Probability and Statistics for Business and Data Science! In this course we cover what you need to know about probability and statistics to succeed in business and the data science field! This practical course will go over theory and implementation of statistics to real world problems. Each section has example problems, in course quizzes, and assessment tests.