Scientific Discovery
Nonzero-sum Adversarial Hypothesis Testing Games
Yasodharan, Sarath, Loiseau, Patrick
We study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria. Our main results are on the exponential rates of convergence of classification errors at equilibrium, which are analogous to the well-known Chernoff-Stein lemma and Chernoff information that describe the error exponents in the classical binary hypothesis testing problem, but with parameters derived from the adversarial model. The results are validated through numerical experiments.
Demystifying hypothesis testing with simple Python examples
Hypothesis testing is a critical tool in inferential statistics, for determining what the value of a population parameter could be. We often draw this conclusion based on a sample data analysis. With the advent of data-driven decision making in business, science, technology, social, and political undertakings, the concept of hypothesis testing has become critically important to understand and apply in the right context. There are a plethora of tests, used in statistical analysis, for this purpose. See this excellent article for a comprehensive overview of which test to use in what situation. The basis of hypothesis testing has two attributes: (a) Null Hypothesis and (b) Alternative Hypothesis.
Explorium reveals $19.1M in total funding for machine learning data discovery platform โ TechCrunch
Explorium, a data discovery platform for machine learning models, received a couple of unannounced funding rounds over the last year -- a $3.6 million seed round last September and a $15.5 million Series A round in March. Today, it made both of these rounds public. The seed round was led by Emerge with participation of F2 Capital. The Series A was led by Zeev Ventures with participation from the seed investors. The total raised is $19.1 million.
How Data Scientists Work Together With Domain Experts in Scientific Collaborations: To Find The Right Answer Or To Ask The Right Question?
Mao, Yaoli, Wang, Dakuo, Muller, Michael, Varshney, Kush R., Baldini, Ioana, Dugan, Casey, AleksandraMojsiloviฤ, null
In recent years there has been an increasing trend in which data scientists and domain experts work together to tackle complex scientific questions. However, such collaborations often face challenges. In this paper, we aim to decipher this collaboration complexity through a semi-structured interview study with 22 interviewees from teams of bio-medical scientists collaborating with data scientists. In the analysis, we adopt the Olsons' four-dimensions framework proposed in Distance Matters to code interview transcripts. Our findings suggest that besides the glitches in the collaboration readiness, technology readiness, and coupling of work dimensions, the tensions that exist in the common ground building process influence the collaboration outcomes, and then persist in the actual collaboration process. In contrast to prior works' general account of building a high level of common ground, the breakdowns of content common ground together with the strengthen of process common ground in this process is more beneficial for scientific discovery. We discuss why that is and what the design suggestions are, and conclude the paper with future directions and limitations.
Pathology - Pixel Scientia Labs - Quantifying Images For Scientific Discovery
Recent improvements in whole slide scanning systems, GPU computing, and deep learning make automated slide analysis well-equipped to solve new and challenging analysis tasks. These learning methods are trained on labeled data. This could be anything from annotating many examples of mitosis, labeling tissue types, or categorizing a full slide or set of slides from a particular patient sample. The goal is then to learning a mapping from the input images to the desired output on training data. Then the same model can be applied to unseen data.
Summer travel diary: Reopening cold cases with robotic data discoveries
As a child of refugees, my parents' narrative is missing huge gaps of information. In our data rich world, archivists are finally piecing together new clues of history using unmanned systems to reopen cold cases. The Nazis were masters in using technology to mechanize killing and erasing all evidence of their crime. Nowhere is this more apparent than in Treblinka, Poland. The death camp exterminated close to 900,000 Jews over a 15-month period before a revolt led to its dismantlement in 1943.
A New Paradigm For Partial Differential Equations With Machine Learning
These tasks are modelled on a system of very famous mathematical equations -- partial differential equations (PDE). PDEs are the class of equations which describe everything smooth and continuous in the physical world, and the most common class of simulation problems in science and engineering. Solving computation hungry PDEs takes a toll even on supercomputers. And, we just can't tweak in the hardware (shrink transistors) for reducing the time consumed, a theory complemented by Moore's law. However, there is still a glimmer of hope.
Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio
Kunisky, Dmitriy, Wein, Alexander S., Bandeira, Afonso S.
These notes survey and explore an emerging method, which we call the low-degree method, for predicting and understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity -- the second moment of the low-degree likelihood ratio -- gives insight into how much computational time is required to solve a given hypothesis testing problem, which can in turn be used to predict the computational hardness of a variety of statistical inference tasks. While this method originated in the study of the sum-of-squares (SoS) hierarchy of convex programs, we present a self-contained introduction that does not require knowledge of SoS. In addition to showing how to carry out predictions using the method, we include a discussion investigating both rigorous and conjectural consequences of these predictions. These notes include some new results, simplified proofs, and refined conjectures. For instance, we point out a formal connection between spectral methods and the low-degree likelihood ratio, and we give a sharp low-degree lower bound against subexponential-time algorithms for tensor PCA.
The Silent Rockstar of BigData: Machine Learning - AnalyticsWeek
Sure, world is crying out loud that big-data's biggest problem will be resources. Demand has skyrocketed and everyone in the world is going into tailspin in meeting that demands. Companies are going frantic and overspending to hire data scientists to secure themselves from any upcoming shortfall. This is nothing but a sign that world needs our robot algorithm friends to pacify some demand and increase credibility to new paradigms. Who could forget Steve Balmer's famous quote comparing Big Data as a Machine Learning problem.