A U.S. company will be deploying the world's most advanced undersea search vessels in a renewed bid to search for missing Malaysia Airlines Flight MH370, which went missing on March 8, 2014, with 239 people on board. Texas-based Ocean Infinity -- which has signed a "no cure, no fee" deal with the Malaysian government to find the jetliner -- will for the first time use a swarm of eight drone-like autonomous underwater vehicles (AUVs) to scour a remote part of the southern Indian Ocean, where the ill-fated plane is believed to have gone down. The company will be paid only if it succeeds in locating the plane, which is believed to have gone down while on its way from Kuala Lumpur to Beijing. According to the Daily Beast, Ocean Infinity will conduct the new search with the latest technology north of the original search area, where an underwater operation for more than three years yielded no concrete clues. Talking about the technology that the company will use, the Daily Beast reported that the system was being used for the first time and that while en route from the Caribbean to the search site, the command ship, Seabed Constructor, paused several times to carry out trials at depths similar to those at the Indian Ocean search site.

I discuss here off-the-beaten-path beautiful, even spectacular results from number theory: not just about prime numbers, but also about related problems such as integers that are sum of two squares. The connection between these numbers and prime numbers will appear later in this article. A few important unsolved mathematical conjectures are presented in a unified approach, and some new research material is also introduced, especially an attempt at generalizing and unifying concepts related to data set density and limiting distributions. The approach is very applied, focusing on algorithms, simulations, and big data, to help discover fascinating results. Even though some of the most exciting topics of mathematics are discussed here (including fundamental, century-old problems still unresolved as well as brand new hypotheses), most of the article can be understood by the layman.

Random walks are also called drunken walks, as they represent the path of a drunken guy moving left and right seemingly randomly, and getting lost over time. Here the process is called self-correcting random walk or also reflective random walk, and is related to controlled random walks, and constrained random walks (see also here) in the sense that the walker, less drunk than in a random walk, is able to correct any departure from a straight path, more and more over time, by either slightly over- or under-correcting at each step. One of the two model parameters (the positive parameter a) represents how drunk the walker is, with a 0 being the worst. So it could be used as a statistical model in clustering problems, each component of the mixture representing a cluster.

Interestingly, I started to research this topic by trying to apply the notorious central limit theorem (CLT) to non-random (static) variables -- that is, to fixed sequences of numbers that look chaotic enough to simulate randomness. While this function produces a sequence of numbers that seems fairly random, there are major differences with truly random numbers, to the point that CLT is no longer valid. Note that oscillations are expected (after all, U(n) is supposed to converge to a statistical distribution, possibly the bell curve, even though we are dealing with non-random sequences) but such large-scale, smooth oscillations, are suspicious. Confidence intervals (CI) can be empirically derived to test a number of assumptions, as illustrated in figure 1: in this example, based on 8 measurements, it is clear that maximum gap CI's for a-sequences are very different from those for random numbers, meaning that a-sequences do not behave like random numbers.

Have you ever thought about how strong a prior is compared to observed data? It features a cyclic process with one event represented by the variable d. There is only 1 observation of that event so it means that maximum likelihood will always assign everything to this variable that cannot be explained by other data. In the plot below you will see the truth which is y and 3 lines corresponding to 3 independent samples from the fitted resulting posterior distribution. Before you start to argue with my reasoning take a look at the plots where we plot the last prior vs the posterior and the point estimate from our generating process.

While having myself a strong mathematical background, I have developed an entire data science and machine learning framework (mostly for data science automation) that is almost free of mathematics, and known as deep data science. You will see that you can learn serious statistical concepts (including limit theorems) without knowing mathematics, much less probabilities or random variables. Anyway, for algorithms processing large volume of data in nearly real-time, computational complexity is still very important: read my article about how bad so many modern algorithms are and could benefit from some lifting, with faster processing time allowing to take into account more metrics, more data, and more complicated metrics, to provide better results. It looks like f(n), as n tends to infinity, is infinitely smaller than log n, log(log n), log(log(log n))), and so on, no matter how many (finite number of) nested log's you have.

In this episode, gnomes collect underpants and make a profit. The business plan is revealed via a slide, of course: AI offers something similar: (1) Collect data, (2) AI, (3) Profit! In an earlier article, I talked through the holy trinity of AI: the chicken (algorithms), eggs (data), and bacon (results). Think of this as a food chain: software is eating the world; software is fed by AI; and AI is fed by data.

In this episode, gnomes collect underpants and make a profit. The business plan is revealed via a slide, of course: AI offers something similar: (1) Collect data, (2) AI, (3) Profit! In an earlier article, I talked through the holy trinity of AI: the chicken (algorithms), eggs (data), and bacon (results). Think of this as a food chain: software is eating the world; software is fed by AI; and AI is fed by data.

It turned out that putting more weight on close neighbors, and increasingly lower weight on far away neighbors (with weights slowly decaying to zero based on the distance to the neighbor in question) was the solution to the problem. For those interested in the theory, the fact that cases 1, 2 and 3 yield convergence to the Gaussian distribution is a consequence of the Central Limit Theorem under the Liapounov condition. More specifically, and because the samples produced here come from uniformly bounded distributions (we use a random number generator to simulate uniform deviates), all that is needed for convergence to the Gaussian distribution is that the sum of the squares of the weights -- and thus Stdev(S) as n tends to infinity -- must be infinite. More generally, we can work with more complex auto-regressive processes with a covariance matrix as general as possible, then compute S as a weighted sum of the X(k)'s, and find a relationship between the weights and the covariance matrix, to eventually identify conditions on the covariance matrix that guarantee convergence to the Gaussian destribution.

The traditional example for the Zipf distribution is the distribution of Internet domains, ranked by traffic. Another example where Zipf's law applies is Internet domains: a few domains like Google, Facebook, Twitter, LinkedIn, and Yahoo dominate, with Google being the big king (in terms of monthly users or pageviews) based on Alexa ranks, while billions of small websites barely get any traffic, even when combined together. Zipf's law also applies to celestial bodies in the solar system, because the process is very similar to the way companies are created and evolve, involving mergers and acquisitions. Structures suspected to be generated by such clustering processes can be modeled using hierarchical Bayes models and simulations (MCMC) to fit model parameters with data, assess goodness-of-fit with Zipf distribution, and then predict the evolution of the system - or at least some of its core properties such as mean.