I discuss here off-the-beaten-path beautiful, even spectacular results from number theory: not just about prime numbers, but also about related problems such as integers that are sum of two squares. The connection between these numbers and prime numbers will appear later in this article. A few important unsolved mathematical conjectures are presented in a unified approach, and some new research material is also introduced, especially an attempt at generalizing and unifying concepts related to data set density and limiting distributions. The approach is very applied, focusing on algorithms, simulations, and big data, to help discover fascinating results. Even though some of the most exciting topics of mathematics are discussed here (including fundamental, century-old problems still unresolved as well as brand new hypotheses), most of the article can be understood by the layman.

Random walks are also called drunken walks, as they represent the path of a drunken guy moving left and right seemingly randomly, and getting lost over time. Here the process is called self-correcting random walk or also reflective random walk, and is related to controlled random walks, and constrained random walks (see also here) in the sense that the walker, less drunk than in a random walk, is able to correct any departure from a straight path, more and more over time, by either slightly over- or under-correcting at each step. One of the two model parameters (the positive parameter a) represents how drunk the walker is, with a 0 being the worst. So it could be used as a statistical model in clustering problems, each component of the mixture representing a cluster.

Interestingly, I started to research this topic by trying to apply the notorious central limit theorem (CLT) to non-random (static) variables -- that is, to fixed sequences of numbers that look chaotic enough to simulate randomness. While this function produces a sequence of numbers that seems fairly random, there are major differences with truly random numbers, to the point that CLT is no longer valid. Note that oscillations are expected (after all, U(n) is supposed to converge to a statistical distribution, possibly the bell curve, even though we are dealing with non-random sequences) but such large-scale, smooth oscillations, are suspicious. Confidence intervals (CI) can be empirically derived to test a number of assumptions, as illustrated in figure 1: in this example, based on 8 measurements, it is clear that maximum gap CI's for a-sequences are very different from those for random numbers, meaning that a-sequences do not behave like random numbers.

Have you ever thought about how strong a prior is compared to observed data? It features a cyclic process with one event represented by the variable d. There is only 1 observation of that event so it means that maximum likelihood will always assign everything to this variable that cannot be explained by other data. In the plot below you will see the truth which is y and 3 lines corresponding to 3 independent samples from the fitted resulting posterior distribution. Before you start to argue with my reasoning take a look at the plots where we plot the last prior vs the posterior and the point estimate from our generating process.

While having myself a strong mathematical background, I have developed an entire data science and machine learning framework (mostly for data science automation) that is almost free of mathematics, and known as deep data science. You will see that you can learn serious statistical concepts (including limit theorems) without knowing mathematics, much less probabilities or random variables. Anyway, for algorithms processing large volume of data in nearly real-time, computational complexity is still very important: read my article about how bad so many modern algorithms are and could benefit from some lifting, with faster processing time allowing to take into account more metrics, more data, and more complicated metrics, to provide better results. It looks like f(n), as n tends to infinity, is infinitely smaller than log n, log(log n), log(log(log n))), and so on, no matter how many (finite number of) nested log's you have.

In this episode, gnomes collect underpants and make a profit. The business plan is revealed via a slide, of course: AI offers something similar: (1) Collect data, (2) AI, (3) Profit! In an earlier article, I talked through the holy trinity of AI: the chicken (algorithms), eggs (data), and bacon (results). Think of this as a food chain: software is eating the world; software is fed by AI; and AI is fed by data.

In this episode, gnomes collect underpants and make a profit. The business plan is revealed via a slide, of course: AI offers something similar: (1) Collect data, (2) AI, (3) Profit! In an earlier article, I talked through the holy trinity of AI: the chicken (algorithms), eggs (data), and bacon (results). Think of this as a food chain: software is eating the world; software is fed by AI; and AI is fed by data.

It turned out that putting more weight on close neighbors, and increasingly lower weight on far away neighbors (with weights slowly decaying to zero based on the distance to the neighbor in question) was the solution to the problem. For those interested in the theory, the fact that cases 1, 2 and 3 yield convergence to the Gaussian distribution is a consequence of the Central Limit Theorem under the Liapounov condition. More specifically, and because the samples produced here come from uniformly bounded distributions (we use a random number generator to simulate uniform deviates), all that is needed for convergence to the Gaussian distribution is that the sum of the squares of the weights -- and thus Stdev(S) as n tends to infinity -- must be infinite. More generally, we can work with more complex auto-regressive processes with a covariance matrix as general as possible, then compute S as a weighted sum of the X(k)'s, and find a relationship between the weights and the covariance matrix, to eventually identify conditions on the covariance matrix that guarantee convergence to the Gaussian destribution.

While having myself a strong mathematical background, I have developed an entire data science and machine learning framework (mostly for data science automation) that is almost free of mathematics, and known as deep data science. You will see that you can learn serious statistical concepts (including limit theorems) without knowing mathematics, much less probabilities or random variables. Anyway, for algorithms processing large volume of data in nearly real-time, computational complexity is still very important: read my article about how bad so many modern algorithms are and could benefit from some lifting, with faster processing time allowing to take into account more metrics, more data, and more complicated metrics, to provide better results. It looks like f(n), as n tends to infinity, is infinitely smaller than log n, log(log n), log(log(log n))), and so on, no matter how many (finite number of) nested log's y

Some leading scientists like Sir Roger Penrose even argue that Goedel showed with his Incompleteness Theorem that today's computers can never reach human level intelligence or consciousness, that humans will always be smarter than current computers or any computer algorithm can ever be and that computers will never in the true sense of the word "understand" anything like higher level mathematics, especially not mathematics that deals with trans-finite sets and numbers. Many famous mathematicians (and physicists) created fascinating new theories and discovered deep and far reaching mathematical results. He proved this by using his famous "diagonal" construction (see pic below) that showed that any supposedly complete enumerated list of irrational or real numbers R will always miss some irrational numbers, thereby proving that a complete enumeration of the real numbers by the natural numbers is impossible. Cantor has actually shown that there are even an infinite number of ever bigger infinities by showing that the set of all subsets of any given infinite set is always substantially bigger (cannot be put into a 1-1 relation) than the set itself.