Goto

Collaborating Authors

 Mathematical & Statistical Methods


A Causal Research Pipeline and Tutorial for Psychologists and Social Scientists

arXiv.org Machine Learning

Causality is a fundamental part of the scientific endeavour to understand the world. Unfortunately, causality is still taboo in much of psychology and social science. Motivated by a growing number of recommendations for the importance of adopting causal approaches to research, we reformulate the typical approach to research in psychology to harmonize inevitably causal theories with the rest of the research pipeline. We present a new process which begins with the incorporation of techniques from the confluence of causal discovery and machine learning for the development, validation, and transparent formal specification of theories. We then present methods for reducing the complexity of the fully specified theoretical model into the fundamental submodel relevant to a given target hypothesis. From here, we establish whether or not the quantity of interest is estimable from the data, and if so, propose the use of semi-parametric machine learning methods for the estimation of causal effects. The overall goal is the presentation of a new research pipeline which can (a) facilitate scientific inquiry compatible with the desire to test causal theories (b) encourage transparent representation of our theories as unambiguous mathematical objects, (c) to tie our statistical models to specific attributes of the theory, thus reducing under-specification problems frequently resulting from the theory-to-model gap, and (d) to yield results and estimates which are causally meaningful and reproducible. The process is demonstrated through didactic examples with real-world data, and we conclude with a summary and discussion of limitations.



Smart Biostatistics

#artificialintelligence

The registration fee is 49 € per person. Don't hesitate to contact us directly using the form on this page.


Hurwitz-Riemann Zeta And Other Special Probability Distributions - AI Summary

#artificialintelligence

All the solutions were probability distributions, and in this article we introduce an even larger, generic class of problems (chaotic discrete dynamical systems) with known solution. Each dynamical system discussed here (or in my previous article) comes with two distributions: The name Hurwitz and Riemann-Zeta is just a reminder of their strong connection to number theory problems such as continued fractions, approximation of irrational numbers by rational ones, the construction and distribution of the digits of random numbers in various numeration systems, and the famous Riemann Hypothesis that has a one million dollar prize attached to it. The most well known probability distribution related to these functions is the discrete Zipf distribution. The author defines a family of distribution that generalizes the exponential power, normal, gamma, Weibull, Rayleigh, Maxwell-Boltzmann and chi-squared distributions, with applications in actuarial sciences. Our Hurwitz-Riemann Zeta distribution is yet another example arising this time from discrete dynamical systems, continuous on [0, 1].


Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

arXiv.org Machine Learning

Continuous-time reinforcement learning offers an appealing formalism for describing control problems in which the passage of time is not naturally divided into discrete increments. Here we consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time, stochastic environment. Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, learning state representations, multiagent coordination, and more. We begin by establishing the distributional analogue of the Hamilton-Jacobi-Bellman (HJB) equation for It\^o diffusions and the broader class of Feller-Dynkin processes. We then specialize this equation to the setting in which the return distribution is approximated by $N$ uniformly-weighted particles, a common design choice in distributional algorithms. Our derivation highlights additional terms due to statistical diffusivity which arise from the proper handling of distributions in the continuous-time setting. Based on this, we propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online control algorithm. We demonstrate the effectiveness of such an algorithm in a synthetic control problem.


Planning Courses for Student Success at the American College of Greece

arXiv.org Artificial Intelligence

We model the problem of optimizing the schedule of courses a student at the American College of Greece will need to take to complete their studies. We model all constraints set forth by the institution and the department, so that we guarantee the validity of all produced schedules. We formulate several different objectives to optimize in the resulting schedule, including fastest completion time, course difficulty balance, and so on, with a very important objective our model is capable of capturing being the maximization of the expected student GPA given their performance on passed courses using Machine Learning and Data Mining techniques. All resulting problems are Mixed Integer Linear Programming problems with a number of binary variables that is in the order of the maximum number of terms times the number of courses available for the student to take. The resulting Mathematical Programming problem is always solvable by the GUROBI solver in less than 10 seconds on a modern commercial off-the-self PC, whereas the manual process that was installed before used to take department heads that are designated as student advisors more than one hour of their time for every student and was resulting in sub-optimal schedules as measured by the objectives set forth.


Classification of Stochastic Processes with Topological Data Analysis

#artificialintelligence

We used the raw, statistical and the topological features to classify time series sampled from different stochastic processes. In our simulation experiments we sampled times series from Wiener and Cauchy processes in both balanced and unbalanced sampling schemes. We then compared machine learning classification models built on topological features and statistical features we engineered on the sampled time series. The results show that the engineered topological features perform consistently better than statistical or raw features in building machine learning classification models even when a given dataset is unbalanced. Our experimental result show that the topologically engineered features alone can distinguish between different stochastic processes, even when statistical or raw features do not.


Linear Algebra and Optimization for Machine Learning: A Textbook: Aggarwal, Charu C.: 9783030403461: Books - Amazon

#artificialintelligence

PDF has better equation formatting than kindle. Charu Aggarwal is a Distinguished Research Staff Member (DRSM) at the IBM T. J. Watson Research Center in Yorktown Heights, New York. He has worked extensively in the field of data mining, with particular interests in data streams, privacy, uncertain data and social network analysis. He has published 19 (8 authored and 11 edited) books, over 400 papers in refereed venues, and has applied for or been granted over 80 patents. Because of the commercial value of the above-mentioned patents, he has received several invention achievement awards and has thrice been designated a Master Inventor at IBM.


Classification of Stochastic Processes with Topological Data Analysis

arXiv.org Artificial Intelligence

In this study, we examine if engineered topological features can distinguish time series sampled from different stochastic processes with different noise characteristics, in both balanced and unbalanced sampling schemes. We compare our classification results against the results of the same classification tasks built on statistical and raw features. We conclude that in classification tasks of time series, different machine learning models built on engineered topological features perform consistently better than those built on standard statistical and raw features.


Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis

arXiv.org Machine Learning

When solving finite-sum minimization problems, two common alternatives to stochastic gradient descent (SGD) with theoretical benefits are random reshuffling (SGD-RR) and shuffle-once (SGD-SO), in which functions are sampled in cycles without replacement. Under a convenient stochastic noise approximation which holds experimentally, we study the stationary variances of the iterates of SGD, SGD-RR and SGD-SO, whose leading terms decrease in this order, and obtain simple approximations. To obtain our results, we study the power spectral density of the stochastic gradient noise sequences. Our analysis extends beyond SGD to SGD with momentum and to the stochastic Nesterov's accelerated gradient method. We perform experiments on quadratic objective functions to test the validity of our approximation and the correctness of our findings.