"The problem of giving rules for producing true scientific statements has been replaced by the problem of finding efficient heuristic rules for culling the reasonable candidates for an explanation from an appropriate set of possible candidates [and finding methods for constructing the candidates]."
– B. Buchanan, quoted in Lindley Darden. Recent Work in Computational Scientific Discovery.
Hypothesis testing is a critical tool in inferential statistics, for determining what the value of a population parameter could be. We often draw this conclusion based on a sample data analysis. With the advent of data-driven decision making in business, science, technology, social, and political undertakings, the concept of hypothesis testing has become critically important to understand and apply in the right context. There are a plethora of tests, used in statistical analysis, for this purpose. See this excellent article for a comprehensive overview of which test to use in what situation.
Explorium, a data discovery platform for machine learning models, received a couple of unannounced funding rounds over the last year -- a $3.6 million seed round last September and a $15.5 million Series A round in March. Today, it made both of these rounds public. The seed round was led by Emerge with participation of F2 Capital. The Series A was led by Zeev Ventures with participation from the seed investors. The total raised is $19.1 million.
Recent improvements in whole slide scanning systems, GPU computing, and deep learning make automated slide analysis well-equipped to solve new and challenging analysis tasks. These learning methods are trained on labeled data. This could be anything from annotating many examples of mitosis, labeling tissue types, or categorizing a full slide or set of slides from a particular patient sample. The goal is then to learning a mapping from the input images to the desired output on training data. Then the same model can be applied to unseen data.
As a child of refugees, my parents' narrative is missing huge gaps of information. In our data rich world, archivists are finally piecing together new clues of history using unmanned systems to reopen cold cases. The Nazis were masters in using technology to mechanize killing and erasing all evidence of their crime. Nowhere is this more apparent than in Treblinka, Poland. The death camp exterminated close to 900,000 Jews over a 15-month period before a revolt led to its dismantlement in 1943.
The members of the physics institute at Via Panisperna were in the habit of giving themselves jocular nicknames: Enrico Fermi was "The Pope," Orso Corbino was "God the Almighty," and Franco Rasetti was "The Cardinal Vicar." It was 1930, and the Italian capital boasted a miraculous collection of scientists on their way to revolutionizing atomic and nuclear physics. Not since Galileo had Italy shown such scientific prominence. The team of mavericks became known as the "Via Panisperna Boys," and was led by the now-celebrated Enrico Fermi, at the time in his 20s and already a full professor. As usually happens with such wondrous groups, it was born out of serendipity, the fortuitous confluence of talented people and visionary politicians. The latter came in the form of a Mafioso protector, Senator Corbino, who was powerful enough to keep science bureaucrats and Mussolini's quirks at bay. Thus sheltered from the real world, the Boys did science in that atmosphere of pranks, jokes, and informality present in every high-intensity scientific establishment, an intellectual ambience popularized in "Surely You're Joking, Mr. Feynman."
These tasks are modelled on a system of very famous mathematical equations -- partial differential equations (PDE). PDEs are the class of equations which describe everything smooth and continuous in the physical world, and the most common class of simulation problems in science and engineering. Solving computation hungry PDEs takes a toll even on supercomputers. And, we just can't tweak in the hardware (shrink transistors) for reducing the time consumed, a theory complemented by Moore's law. However, there is still a glimmer of hope.
These notes survey and explore an emerging method, which we call the low-degree method, for predicting and understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity -- the second moment of the low-degree likelihood ratio -- gives insight into how much computational time is required to solve a given hypothesis testing problem, which can in turn be used to predict the computational hardness of a variety of statistical inference tasks. While this method originated in the study of the sum-of-squares (SoS) hierarchy of convex programs, we present a self-contained introduction that does not require knowledge of SoS. In addition to showing how to carry out predictions using the method, we include a discussion investigating both rigorous and conjectural consequences of these predictions. These notes include some new results, simplified proofs, and refined conjectures. For instance, we point out a formal connection between spectral methods and the low-degree likelihood ratio, and we give a sharp low-degree lower bound against subexponential-time algorithms for tensor PCA.
Sure, world is crying out loud that big-data's biggest problem will be resources. Demand has skyrocketed and everyone in the world is going into tailspin in meeting that demands. Companies are going frantic and overspending to hire data scientists to secure themselves from any upcoming shortfall. This is nothing but a sign that world needs our robot algorithm friends to pacify some demand and increase credibility to new paradigms. Who could forget Steve Balmer's famous quote comparing Big Data as a Machine Learning problem.
The newly organized research project "MELLODDY" (Machine Learning Ledger Orchestration for Drug Discovery), involving ten large pharma companies and seven technology providers, is that kind of deals which can catalyze a transition of the pharmaceutical industry to a new level -- a "paradigm shift", as one might refer to it in terms of Thomas Kuhn's "The Structure of Scientific Revolutions". The project aims at developing a state-of-the-art platform for collaboration, based on Owkin's blockchain architecture technology, which would allow collective training of artificial intelligence (AI) algorithms using data from multiple direct pharmaceutical competitors, without exposing their internal know-hows and compromising their intellectual property -- for the collective benefit of everyone involved. While artificial intelligence (AI) already proved to be a groundbreaking thing in many industries (robotics, finance, surveillance, cyber security, self-driving cars to name just a few), drug discovery still seems like a hard case for machine learning practitioners. A major reason for that is the lack of quality data to train models properly. It might seem surprising, as pharmaceutical research generates enormous amounts of data daily.