"The problem of giving rules for producing true scientific statements has been replaced by the problem of finding efficient heuristic rules for culling the reasonable candidates for an explanation from an appropriate set of possible candidates [and finding methods for constructing the candidates]."
– B. Buchanan, quoted in Lindley Darden. Recent Work in Computational Scientific Discovery.
When analysing graph structure, it can be difficult to determine whether patterns found are due to chance, or due to structural aspects of the process that generated the data. Hypothesis tests are often used to support such analyses. These allow us to make statistical inferences about which null models are responsible for the data, and they can be used as a heuristic in searching for meaningful patterns. The minimum description length (MDL) principle [6, 4] allows us to build such hypothesis tests, based on efficient descriptions of the data. Broadly: we translate the regularity we are interested in into a code for the data, and if this code describes the data more efficiently than a code corresponding to the null model, by a sufficient margin, we may reject the null model. This is a frequentist approach to MDL, based on hypothesis testing. Bayesian approaches to MDL for model selection rather than model rejection are more common, but for the purposes of pattern analysis, a hypothesis testing approach provides a more natural fit with existing literature. 1 We provide a brief illustration of this principle based on the running example of analysing the size of the largest clique in a graph. We illustrate how a code can be constructed to efficiently represent graphs with large cliques, and how the description length of the data under this code can be compared to the description length under a code corresponding to a null model to show that the null model is highly unlikely to have generated the data.
Wireless communications have undeniably shaped our everyday lives. We expect ubiquitous connectivity to the Internet, with increasing demands for higher data rates and low lag everywhere: at work, at home, on the road, even with massive crowds of Internet users around us. Despite impressive breakthroughs in almost every part of our wireless devices--from antennas and hardware to operating software--this demand is getting increasingly challenging to address. The large scale of research efforts and investment in the fifth generation (5G) of wireless communications reflects the enormity of the challenge.1 A valuable and seemingly unnoticed resource could be exploited to meet this goal.
"We're in the middle of a paradigm shift, a time when the choice of experiments and the execution of experiments are not really things that people do," says Bob Murphy, the head of the computational biology department at Carnegie Mellon University. Details: Experimental science is expensive. In biology, for example, pricey equipment and labor mean that scientists can't do all the experiments they would like. Instead, they have to prioritize the ones they think will give them the most information about the questions they are after, and then extrapolate to estimate the outcomes of the experiments they didn't do. Automating science makes it easier to do big experiments, allowing more people to participate -- and potentially boosting the scientific output of countries that have traditionally trailed the U.S.
What would happen if we made all of our data public--everything from wearables monitoring our biometrics, all the way to smartphones monitoring our location, our social media activity, and even our internet search history? Would such insights into our lives simply provide companies and politicians with greater power to invade our privacy and manipulate us by using our psychological profiles against us? A burgeoning new philosophy called dataism doesn't think so. In fact, this trending ideology believes that liberating the flow of data is the supreme value of the universe, and that it could be the key to unleashing the greatest scientific revolution in the history of humanity. First mentioned by David Brooks in his 2013 New York Times article "The Philosophy of Data," dataism is an ethical system that has been most heavily explored and popularized by renowned historian, Yuval Noah Harari.
Sure, you've heard of game theory. And sure, trading is like a game: you devise a strategy, learn the rules, and try to beat everyone else to the punch, finding trends before anyone else does. You've even heard that letting your emotions get the best of you is a terrible trading strategy. To start to undermine our own self-destructive habits, it helps to understand how the game is geared toward those who figure out the game behind the game: how to play to win by controlling the emotions that lead us to bad decisions. Game theory can be applied to human trading because the object of a trade is to "win" a profit.
The sun formed 4.5 billion years ago, but it's got around 6 billion years more before its fuel runs out. It will then flare up, engulfing the inner planets. And the expanding universe will continue--perhaps forever--destined to become ever colder, ever emptier. To quote Woody Allen, eternity is very long, especially toward the end. Any creatures witnessing the sun's demise won't be human--they'll be as different from us as we are from a bug.
Scientific discovery is mediated by ideas that, after being formulated in hypotheses, can be tested, validated, and quantified before they eventually lead to accepted concepts. Computer-mediated discovery in astrophysics is no exception, but antiquated code that is only intelligible to scientists who were involved in writing it is holding up scientific discovery in the field. A bold initiative is needed to modernize astrophysics code and make it transparent and useful beyond a small group of scientists. Scientific software is like a prototype in a laboratory experiment; it must stimulate experimentation. The eventual code is a description of concepts and their relationships, which are imperative for reproducibility and validating the results.
In this work, we consider hypothesis testing and anomaly detection on datasets where each observation is a weighted network. Examples of such data include brain connectivity networks from fMRI flow data, or word co-occurrence counts for populations of individuals. Current approaches to hypothesis testing for weighted networks typically requires thresholding the edge-weights, to transform the data to binary networks. This results in a loss of information, and outcomes are sensitivity to choice of threshold levels. Our work avoids this, and we consider weighted-graph observations in two situations, 1) where each graph belongs to one of two populations, and 2) where entities belong to one of two populations, with each entity possessing multiple graphs (indexed e.g. by time). Specifically, we propose a hierarchical Bayesian hypothesis testing framework that models each population with a mixture of latent space models for weighted networks, and then tests populations of networks for differences in distribution over components. Our framework is capable of population-level, entity-specific, as well as edge-specific hypothesis testing. We apply it to synthetic data and three real-world datasets: two social media datasets involving word co-occurrences from discussions on Twitter of the political unrest in Brazil, and on Instagram concerning Attention Deficit Hyperactivity Disorder (ADHD) and depression drugs, and one medical dataset involving fMRI brain-scans of human subjects. The results show that our proposed method has lower Type I error and higher statistical power compared to alternatives that need to threshold the edge weights. Moreover, they show our proposed method is better suited to deal with highly heterogeneous datasets.
In a world where patients are getting more and more involved in their own health, the problems of manual processes are many. Having medical records, research documents, lab reports, doctor prescriptions, etc. on paper restricts seamless understanding and sharing of important health information that ultimately affects care outcome. Although advancements in healthcare technology have been remarkable, the information they provide is not sufficient to make improved healthcare decisions. What is required, in my opinion, is for healthcare information to be enhanced by the power of analytics and machine learning. Through advanced analytics, machine learning can help provide better information to doctors at the point of patient care.
Errol Morris feels that Thomas Kuhn saved him from a career he was not suited for--by having him thrown out of Princeton. In 1972, Kuhn was a professor of philosophy and the history of science at Princeton, and author of The Structure of Scientific Revolutions, which gave the world the term "paradigm shift." As Morris tells the story in his recent book, The Ashtray, Kuhn was antagonized by Morris' suggestions that Kuhn was a megalomaniac and The Structure of Scientific Revolutions was an assault on truth and progress. To say the least, Morris, then 24, was already the iconoclast who would go on to make some of the most original documentary films of our time. After launching the career he was suited for with The Gates of Heaven in 1978, a droll affair about pet cemeteries, Morris earned international acclaim with The Thin Blue Line, which led to the reversal of a murder conviction of a prisoner who had been on death row. In 2004, Morris won an Academy Award for The Fog of War, a dissection of former Secretary of Defense Robert McNamara, a major architect of the Vietnam War. His 2017 film, Wormwood, a miniseries on Netflix, centers on the mystery surrounding a scientist who in 1975 worked on a biological warfare program for the Army, and suspiciously fell to his death from a hotel room. The Ashtray--Morris explains the title in our interview below--is as arresting and idiosyncratic as Morris' films.