"The problem of giving rules for producing true scientific statements has been replaced by the problem of finding efficient heuristic rules for culling the reasonable candidates for an explanation from an appropriate set of possible candidates [and finding methods for constructing the candidates]."
– B. Buchanan, quoted in Lindley Darden. Recent Work in Computational Scientific Discovery.
Scientific discovery is mediated by ideas that, after being formulated in hypotheses, can be tested, validated, and quantified before they eventually lead to accepted concepts. Computer-mediated discovery in astrophysics is no exception, but antiquated code that is only intelligible to scientists who were involved in writing it is holding up scientific discovery in the field. A bold initiative is needed to modernize astrophysics code and make it transparent and useful beyond a small group of scientists. Scientific software is like a prototype in a laboratory experiment; it must stimulate experimentation. The eventual code is a description of concepts and their relationships, which are imperative for reproducibility and validating the results.
In this work, we consider hypothesis testing and anomaly detection on datasets where each observation is a weighted network. Examples of such data include brain connectivity networks from fMRI flow data, or word co-occurrence counts for populations of individuals. Current approaches to hypothesis testing for weighted networks typically requires thresholding the edge-weights, to transform the data to binary networks. This results in a loss of information, and outcomes are sensitivity to choice of threshold levels. Our work avoids this, and we consider weighted-graph observations in two situations, 1) where each graph belongs to one of two populations, and 2) where entities belong to one of two populations, with each entity possessing multiple graphs (indexed e.g. by time). Specifically, we propose a hierarchical Bayesian hypothesis testing framework that models each population with a mixture of latent space models for weighted networks, and then tests populations of networks for differences in distribution over components. Our framework is capable of population-level, entity-specific, as well as edge-specific hypothesis testing. We apply it to synthetic data and three real-world datasets: two social media datasets involving word co-occurrences from discussions on Twitter of the political unrest in Brazil, and on Instagram concerning Attention Deficit Hyperactivity Disorder (ADHD) and depression drugs, and one medical dataset involving fMRI brain-scans of human subjects. The results show that our proposed method has lower Type I error and higher statistical power compared to alternatives that need to threshold the edge weights. Moreover, they show our proposed method is better suited to deal with highly heterogeneous datasets.
In a world where patients are getting more and more involved in their own health, the problems of manual processes are many. Having medical records, research documents, lab reports, doctor prescriptions, etc. on paper restricts seamless understanding and sharing of important health information that ultimately affects care outcome. Although advancements in healthcare technology have been remarkable, the information they provide is not sufficient to make improved healthcare decisions. What is required, in my opinion, is for healthcare information to be enhanced by the power of analytics and machine learning. Through advanced analytics, machine learning can help provide better information to doctors at the point of patient care.
Errol Morris feels that Thomas Kuhn saved him from a career he was not suited for--by having him thrown out of Princeton. In 1972, Kuhn was a professor of philosophy and the history of science at Princeton, and author of The Structure of Scientific Revolutions, which gave the world the term "paradigm shift." As Morris tells the story in his recent book, The Ashtray, Kuhn was antagonized by Morris' suggestions that Kuhn was a megalomaniac and The Structure of Scientific Revolutions was an assault on truth and progress. To say the least, Morris, then 24, was already the iconoclast who would go on to make some of the most original documentary films of our time. After launching the career he was suited for with The Gates of Heaven in 1978, a droll affair about pet cemeteries, Morris earned international acclaim with The Thin Blue Line, which led to the reversal of a murder conviction of a prisoner who had been on death row. In 2004, Morris won an Academy Award for The Fog of War, a dissection of former Secretary of Defense Robert McNamara, a major architect of the Vietnam War. His 2017 film, Wormwood, a miniseries on Netflix, centers on the mystery surrounding a scientist who in 1975 worked on a biological warfare program for the Army, and suspiciously fell to his death from a hotel room. The Ashtray--Morris explains the title in our interview below--is as arresting and idiosyncratic as Morris' films.
We have seen significant recent progress in pattern analysis and machine intelligence applied to images, audio and video signals, and natural language text, but not as much applied to another artifact produced by people: computer program source code. In a paper to be presented at the FEED Workshop at KDD 2018, we showcase a system that makes progress towards the semantic analysis of code. By doing so, we provide the foundation for machines to truly reason about program code and learn from it. The work, also recently demonstrated at IJCAI 2018, is conceived and led by IBM Science for Social Good fellow Evan Patterson and focuses specifically on data science software. Data science programs are a special kind of computer code, often fairly short, but full of semantically rich content that specifies a sequence of data transformation, analysis, modeling, and interpretation operations.
As the saying goes, the only constant is change itself. Throughout history, there has been disruptive innovation that has completely altered the way people work, live and learn in ways that were never imagined. One of those is the invention of the printing press, which ushered in the Scientific Revolution, the Enlightenment and the Modern Age. Through one breakthrough product, people became much more worldly and literate. It also opened the door to all kinds of new professions and skills.
What once started as early analysis of singular data sources has now evolved into far more robust ways of analyzing information and the relationships between different fields and information sources. Data discovery is another area where machine learning (ML) is beginning to make inroads. Twenty years ago, data discovery was a term used to define the early analytics needed to better understand data. For instance, Evoke Software was a company that analyzed large volumes of customer data. It both used metadata to understand field content to find trends and exceptions, and also looked at raw data and used algorithms to identify field boundaries in older or less documented data sources.
If you're here, we assume that you're a developer/hacker who's intrigued by the blockchain. You're convinced that you understand how it works and now you're itching to figure out what the blockchain means for you and your developer skill set. If you need more of a primer we'd recommend starting with the bitcoin white paper and Ethereum white paper. Our goal in this post is to: 1. Explain how blockchain development differs from existing development paradigms 2. Provide context for the opportunities and challenges in this space 3. Point you to resources that will give you the foundation to start developing in this new paradigm Internet applications benefit from network effects because they maintain centralized silos of information. Built upon shared, open protocols (e.g.
One important assumption underlying common classification models is the stationarity of the data. However, in real-world streaming applications, the data concept indicated by the joint distribution of feature and label is not stationary but drifting over time. Concept drift detection aims to detect such drifts and adapt the model so as to mitigate any deterioration in the model's predictive performance. Unfortunately, most existing concept drift detection methods rely on a strong and over-optimistic condition that the true labels are available immediately for all already classified instances. In this paper, a novel Hierarchical Hypothesis Testing framework with Request-and-Reverify strategy is developed to detect concept drifts by requesting labels only when necessary. Two methods, namely Hierarchical Hypothesis Testing with Classification Uncertainty (HHT-CU) and Hierarchical Hypothesis Testing with Attribute-wise "Goodness-of-fit" (HHT-AG), are proposed respectively under the novel framework. In experiments with benchmark datasets, our methods demonstrate overwhelming advantages over state-of-the-art unsupervised drift detectors. More importantly, our methods even outperform DDM (the widely used supervised drift detector) when we use significantly fewer labels.