Goto

Collaborating Authors

 Mathematical & Statistical Methods


Connecting actuarial judgment to probabilistic learning techniques with graph theory

arXiv.org Artificial Intelligence

The aim of improvements in data driven exercises in insurance has led to the desire to gather additional data than traditionally available. In addition to underwriting characteristics such as age, gender and address, technology now allows the collection of many more variables. Examples include dynamic data from sensors for driving behaviour in vehicles, appliance and electrical usage in homes and static data from external databases on traffic violations, crime scores or credit scores. High dimensional models arise if modelling sensor data at multiple time points and the individual variables that comprise summary scores. Reasoning with a large number of variables can become unnecessarily complex without any actuarial judgment. For example, it may not be necessary to include hundreds of rating factors as predictors if many of them are known to be related or unnecessary. This discussion proposes the use of graph theory as a means of translating intuitive reasoning to mathematical properties. This is done via graphical models, which involve the use of graph theory to formulate probabilistic models (Lauritzen, 1996). The approach has been used in applications such as medical expert systems (Franklin et al., 1989), natural language processing (Blei et al., 2003), image processing, bioinformatics and others (Wainwright and Jordan, 2008).


Linear transformations and matrices

#artificialintelligence

This post will be quite an interesting one. We will show how a 2D plane can be transformed into another one. Understanding these concepts is a crucial step for some more advanced linear algebra/machine learning methods (e.g. So, let's proceed and we will learn how to connect a matrix-vector multiplication with a linear transformation. In this post we will introduce a linear transformation. A linear transformation can also be seen as a simple function.


Machine-Learning-Tokyo/Math_resources

#artificialintelligence

This is a collection of pages demonstrating the use of the interact command in Sage. It should be easy to just scroll through and copy/paste examples into Sage notebooks. Examples include Algebra, Bioinformatics, Calculus, Cryptography, Differential Equations, Drawing Graphics, Dynamical Systems, Fractals, Games and Diversions, Geometry, Graph Theory, Linear Algebra, Loop Quantum Gravity, Number Theory, Statistics/Probability, Topology, Web Applications.


Graph Gamma Process Generalized Linear Dynamical Systems

arXiv.org Machine Learning

We introduce graph gamma process (GGP) linear dynamical systems to model real-valued multivariate time series. For temporal pattern discovery, the latent representation under the model is used to decompose the time series into a parsimonious set of multivariate sub-sequences. In each sub-sequence, different data dimensions often share similar temporal patterns but may exhibit distinct magnitudes, and hence allowing the superposition of all sub-sequences to exhibit diverse behaviors at different data dimensions. We further generalize the proposed model by replacing the Gaussian observation layer with the negative binomial distribution to model multivariate count time series. Generated from the proposed GGP is an infinite dimensional directed sparse random graph, which is constructed by taking the logical OR operation of countably infinite binary adjacency matrices that share the same set of countably infinite nodes. Each of these adjacency matrices is associated with a weight to indicate its activation strength, and places a finite number of edges between a finite subset of nodes belonging to the same node community. We use the generated random graph, whose number of nonzero-degree nodes is finite, to define both the sparsity pattern and dimension of the latent state transition matrix of a (generalized) linear dynamical system. The activation strength of each node community relative to the overall activation strength is used to extract a multivariate sub-sequence, revealing the data pattern captured by the corresponding community. On both synthetic and real-world time series, the proposed nonparametric Bayesian dynamic models, which are initialized at random, consistently exhibit good predictive performance in comparison to a variety of baseline models, revealing interpretable latent state transition patterns and decomposing the time series into distinctly behaved sub-sequences.


Off-Policy Evaluation via the Regularized Lagrangian

arXiv.org Machine Learning

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data. While these estimators all perform some form of stationary distribution correction, they arise from different derivations and objective functions. In this paper, we unify these estimators as regularized Lagrangians of the same linear program. The unification allows us to expand the space of DICE estimators to new alternatives that demonstrate improved performance. More importantly, by analyzing the expanded space of estimators both mathematically and empirically we find that dual solutions offer greater flexibility in navigating the tradeoff between optimization stability and estimation bias, and generally provide superior estimates in practice.


Scalable Linear Algebra on a Relational Database System

Communications of the ACM

As data analytics has become an important application for modern data management systems, a new category of data management system has appeared recently: the scalable linear algebra system. We argue that a parallel or distributed database system is actually an excellent platform upon which to build such functionality. Most relational systems already have support for cost-based optimization--which is vital to scaling linear algebra computations--and it is well known how to make relational systems scalable. We show that by making just a few changes to a parallel/distributed relational database system, such a system can become a competitive platform for scalable linear algebra. Taken together, our results should at least raise the possibility that brand new systems designed from the ground up to support scalable linear algebra are not absolutely necessary, and that such systems could instead be built on top of existing relational technology. Data analytics, such as machine learning and large-scale statistical processing, is an important application domain, and such computations often require linear algebra. As such, a lot of recent efforts have been targeted at building distributed linear algebra systems, with the goal of supporting large-scale data analytics. Unlike classical efforts in high-performance computing such as ScaLAPACK6, such systems may include support for storage/retrieval of data to/from disk, buffering/caching of data, and automatic logical/physical optimizations of computations (automatic rewriting of queries, pipelining, etc.). Such systems also typically offer some form of recovery, as well as a domain-specific language. One example of such a system is SystemML, developed at IBM.12 Given deep learning's reliance on arrays and array-based operations such as matrix multiply, systems facilitating distributed deep learning, such as TensorFlow,3 can also be included among such efforts. In the database area, there has long been of interest in building array database systems.17,5


Technical Perspective: Supporting Linear Algebra Operations in SQL

Communications of the ACM

Linear algebra operations are at the core of machine learning. Multiple specialized systems have emerged for the scalable, distributed execution of matrix and vector operations. The relationship of such computations to data management and databases however brings frictions. It is well known that a great deal of human time and machine time is being spent nowadays on fetching data out of the database and performing a computation on a specialized system. One answer to the issue is that we truly need a new kind of non-SQL database that is tuned to these computations.


On Learned Sketches for Randomized Numerical Linear Algebra

arXiv.org Machine Learning

We study "learning-based" sketching approaches for diverse tasks in numerical linear algebra: least-squares regression, $\ell_p$ regression, Huber regression, low-rank approximation (LRA), and $k$-means clustering. Sketching methods are used to quickly and approximately compute properties of large matrices. Linear maps called "sketches" are applied to compress data, and these concise representations are used to compute the desired properties. Specifically, we consider sparse sketches (such as CountSketch). Recent works have dealt with optimizing sketches for data distributions to perform better than their random counterparts. We extend this theme to several important and ubiquitous tasks, each of which requires a new analysis and novel practical methods. Specifically, our contributions are: 1) For all tasks, we introduce fast algorithms using learned sketches with worst-case guarantees. We give a simple task-agnostic method for retaining the worst-case guarantees of randomized sketching, which yields time-optimal algorithms for LRA and least-squares regression. Also, for $k$-means clustering, we give a faster alternative for retaining worst-case guarantees. 2) We show empirically that learned sketches are reliable in improving approximation accuracy, with comparison against "non-learned" sketching baselines. 3) We introduce a greedy algorithm for optimizing the location of the nonzero entries of a sparse sketch and prove guarantees for certain distributions on the LRA task. Previous work only looked at optimizing the values rather than the locations. Also, we show empirically that it further improves learned sketch performance.


On the Theoretical Properties of the Exchange Algorithm

arXiv.org Machine Learning

Exchange algorithm is one of the most popular extensions of Metropolis-Hastings algorithm to sample from doubly-intractable distributions. However, theoretical exploration of exchange algorithm is very limited. For example, natural questions like `Does exchange algorithm converge at a geometric rate?' or `Does the exchange algorithm admit a Central Limit Theorem?' have not been answered. In this paper, we study the theoretical properties of exchange algorithm, in terms of asymptotic variance and convergence speed. We compare the exchange algorithm with the original Metropolis-Hastings algorithm and provide both necessary and sufficient conditions for geometric ergodicity of the exchange algorithm, which can be applied to various practical applications such as exponential random graph models and Ising models. A central limit theorem for the exchange algorithm is also established. Meanwhile, a concrete example, involving the Binomial model with conjugate and non-conjugate priors, is treated in detail with sharp convergence rates. Our results justify the theoretical usefulness of the exchange algorithm.


Statistical Inference for Networks of High-Dimensional Point Processes

arXiv.org Machine Learning

Fueled in part by recent applications in neuroscience, the multivariate Hawkes process has become a popular tool for modeling the network of interactions among high-dimensional point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work has primarily addressed estimation. To bridge this gap, this paper develops a new statistical inference procedure for high-dimensional Hawkes processes. The key ingredient for this inference procedure is a new concentration inequality on the first- and second-order statistics for integrated stochastic processes, which summarize the entire history of the process. Combining recent results on martingale central limit theory with the new concentration inequality, we then characterize the convergence rate of the test statistics. We illustrate finite sample validity of our inferential tools via extensive simulations and demonstrate their utility by applying them to a neuron spike train data set.