Mathematical & Statistical Methods
Spark Technology Center
The Best Paper award for this year's International Conference on Very Large Data Bases (VLDB) goes to "Compressed Linear Algebra for Large-Scale Machine Learning", authored by a PhD candidate at the University of Maryland and four senior researchers from IBM. Their method for compressing matrices for linear algebra operations promises to provide users significant increases in speed with less memory. In particular, the compression technology provides benefits at two different parts of the data science process. Before training a model, a data scientist typically goes through multiple iterations of feature engineering. Common feature engineering tasks include examining the data with descriptive statistics and transforming the values in columns to better suit the assumptions built into different types of machine learning models.
A possible implementation for an Intelligent Agent using Graph theories to crawl Reddit. (RedditSharp QuickGraph MongoDB)
I cannot think more than 2 hours without thinking how to introduce AI techniques into what I'm thinking about. The last time it happened was super interesting and stay with me to see how I used graph theories to crawl reddit and make a knowledge base about Magic the Gathering card relations. Long story short, I was browsing magiccardmarket.eu to check which cards to buy when I found a guy selling a 9 card for 6 . The card spiked over the week-end and I jumped on reddit to check out the reason. Is there a new deck using it?
Cox process representation and inference for stochastic reaction-diffusion processes
Schnoerr, David, Grima, Ramon, Sanguinetti, Guido
Complex behaviour in many systems arises from the stochastic interactions of spatially distributed particles or agents. Stochastic reaction-diffusion processes are widely used to model such behaviour in disciplines ranging from biology to the social sciences, yet they are notoriously difficult to simulate and calibrate to observational data. Here we use ideas from statistical physics and machine learning to provide a solution to the inverse problem of learning a stochastic reactiondiffusion process from data. Our solution relies on a nontrivial connection between stochastic reaction-diffusion processes and spatiotemporal Cox processes, a well-studied class of models from computational statistics. This connection leads to an efficient and flexible algorithm for parameter inference and model selection. Our approach shows excellent accuracy on numeric and real data examples from systems biology and epidemiology. Our work provides both insights into spatiotemporal stochastic systems, and a practical solution to a longstanding problem in computational modelling. Many complex behaviours in several disciplines originate from a common mechanism: the dynamics of locally interacting, spatially distributed agents. Examples arise at all spatial scales and in a wide range of scientific fields, from microscopic interactions of low-abundance molecules within cells, to ecological and epidemic phenomena at the continental scale. Frequently, stochasticity and spatial heterogeneity play a crucial role in determining the process dynamics and the emergence of collective behaviour [1]-[8]. Stochastic reaction-diffusion processes (SRDPs) constitute a convenient mathematical framework to model such systems. SRDPs were originally introduced in statistical physics [10, 11] to describe the collective behaviour of populations of point-wise agents performing Brownian diffusion in space and stochastically interacting with other, nearby agents according to predefined rules. The flexibility afforded by the local interaction rules has led to a wide application of SRDPs in many different scientific disciplines where complex spatiotemporal behaviours arise, from molecular biology [4, 9, 12], to ecology [13], to the social sciences [14]. Despite their popularity, SRDPs pose considerable challenges, as analytical computations are only possible for a handful of systems [8].
A Master of Umbral Moonshine Toys With String Theory
After the Eyjafjallajökull volcano erupted in Iceland in 2010, flight cancellations left Miranda Cheng stranded in Paris. While waiting for the ash to clear, Cheng, then a postdoctoral researcher at Harvard University studying string theory, got to thinking about a paper that had recently been posted online. Its three coauthors had pointed out a numerical coincidence connecting far-flung mathematical objects. "That smells like another moonshine," Cheng recalled thinking. "Could it be another moonshine?" She happened to have read a book about the "monstrous moonshine," a mathematical structure that unfolded out of a similar bit of numerology: In the late 1970s, the mathematician John McKay noticed that 196,884, the first important coefficient of an object called the j-function, was the sum of one and 196,883, the first two dimensions in which a giant collection of symmetries called the monster group could be represented.
Chi-Squared Test
Before we build stats/machine learning models, it is a good practice to understand which predictors are significant and have an impact on the response variable. In this post we deal with a particular case when both your response and predictor are categorical variables. By the end of this you'd have gained an understanding of what predictive modelling is and what the significance and purpose of chi-square statistic is. We will go through a hypothetical case study to understand the math behind it. We will actually implement a chi-squared test in R and learn to interpret the results.
Selection of resources to learn Artificial Intelligence / Machine Learning / Statistical Inference… -- Artists and Machine Intelligence
This is a very incomplete and subjective selection of resources to learn about the algorithms and maths of Artificial Intelligence (AI) / Machine Learning (ML) / Statistical Inference (SI) / Deep Learning (DL) / Reinforcement Learning (RL). It is aimed at beginners (those without Computer Science background and not knowing anything about these subjects) and hopes to take them to quite advanced levels (able to read and understand DL papers). It is not an exhaustive list and only contains some of the learning materials that I have personally completed so that I can include brief personal comments on them. It is also by no means the best path to follow (nowadays most MOOCs have full paths all the way from basic statistics and linear algebra to ML/DL). But this is the path I took and in a sense it's a partial documentation of my personal journey into DL (actually I bounced around all of these back and forth like crazy).
The Spectral Condition Number Plot for Regularization Parameter Determination
Peeters, Carel F. W., van de Wiel, Mark A., van Wieringen, Wessel N.
Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter selection. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators.
Kernel Density Estimation for Dynamical Systems
Hang, Hanyuan, Steinwart, Ingo, Feng, Yunlong, Suykens, Johan A. K.
We study the density estimation problem with observations generated by certain dynamical systems that admit a unique underlying invariant Lebesgue density. Observations drawn from dynamical systems are not independent and moreover, usual mixing concepts may not be appropriate for measuring the dependence among these observations. By employing the $\mathcal{C}$-mixing concept to measure the dependence, we conduct statistical analysis on the consistency and convergence of the kernel density estimator. Our main results are as follows: First, we show that with properly chosen bandwidth, the kernel density estimator is universally consistent under $L_1$-norm; Second, we establish convergence rates for the estimator with respect to several classes of dynamical systems under $L_1$-norm. In the analysis, the density function $f$ is only assumed to be H\"{o}lder continuous which is a weak assumption in the literature of nonparametric density estimation and also more realistic in the dynamical system context. Last but not least, we prove that the same convergence rates of the estimator under $L_\infty$-norm and $L_1$-norm can be achieved when the density function is H\"{o}lder continuous, compactly supported and bounded. The bandwidth selection problem of the kernel density estimator for dynamical system is also discussed in our study via numerical simulations.