Scientific Discovery
Why Philip Pullman Is Obsessed with Panpsychism - Facts So Romantic
Philip Pullman is once again having a moment, thanks to the new blockbuster adaptation of His Dark Materials by the BBC and HBO. His fantasy classic--filled with witches, talking bears and "daemons" (people's alter-egos that take animal form)--is rendered in glorious steampunk detail. Pullman has also returned to the fictional world of his heroine, Lyra Belacqua, with a new trilogy, The Book of Dust, which probes more deeply into the central question of his earlier books: What is the nature of consciousness? Pullman loves to write about big ideas, and recent scientific discoveries about dark matter and the Higgs boson have inspired certain plot elements in his novels. The biggest mystery in these books--an enigmatic substance called Dust--comes right out of current debates among scientists and philosophers about the origins of consciousness and the provocative theory of panpsychism.
Robust Hypothesis Testing Using Wasserstein Uncertainty Sets
GAO, RUI, Xie, Liyan, Xie, Yao, Xu, Huan
We develop a novel computationally efficient and general framework for robust hypothesis testing. The new framework features a new way to construct uncertainty sets under the null and the alternative distributions, which are sets centered around the empirical distribution defined via Wasserstein metric, thus our approach is data-driven and free of distributional assumptions. We develop a convex safe approximation of the minimax formulation and show that such approximation renders a nearly-optimal detector among the family of all possible tests. By exploiting the structure of the least favorable distribution, we also develop a tractable reformulation of such approximation, with complexity independent of the dimension of observation space and can be nearly sample-size-independent in general. Real-data example using human activity data demonstrated the excellent performance of the new robust detector.
Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Javanmard, Adel, Montanari, Andrea
Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty' associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values.
Adaptive Active Hypothesis Testing under Limited Information
We consider the problem of active sequential hypothesis testing where a Bayesian decision maker must infer the true hypothesis from a set of hypotheses. The decision maker may choose for a set of actions, where the outcome of an action is corrupted by independent noise. In this paper we consider a special case where the decision maker has limited knowledge about the distribution of observations for each action, in that only a binary value is observed. Our objective is to infer the true hypothesis with low error, while minimizing the number of action sampled. Our main results include the derivation of a lower bound on sample size for our system under limited knowledge and the design of an active learning policy that matches this lower bound and outperforms similar known algorithms.
Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease
Zhou, Hao, Ithapu, Vamsi K., Ravi, Sathya Narayanan, Singh, Vikas, Wahba, Grace, Johnson, Sterling C.
Our goal is to perform a statistical test checking if $P_{\rm source}$ $P_{\rm target}$ while removing the distortions induced by the transformations. This problem is closely related to concepts underlying numerous domain adaptation algorithms, and in our case, is motivated by the need to combine clinical and imaging based biomarkers from multiple sites and/or batches, where this problem is fairly common and an impediment in the conduct of analyses with much larger sample sizes. We develop a framework that addresses this problem using ideas from hypothesis testing on the transformed measurements, where in the distortions need to be estimated {\it in tandem} with the testing. We derive a simple algorithm and study its convergence and consistency properties in detail, and we also provide lower-bound strategies based on recent work in continuous optimization. On a dataset of individuals at risk for neurological disease, our results are competitive with alternative procedures that are twice as expensive and in some cases operationally infeasible to implement.
A Novel Kuhnian Ontology for Epistemic Classification of STM Scholarly Articles
Saqr, Khalid M., Elsharawy, Abdelrahman
Thomas Kuhn proposed his paradigmatic view of scientific discovery five decades ago. The concept of paradigm has not only explained the progress of science, but has also become the central epistemic concept among STM scientists. Here, we adopt the principles of Kuhnian philosophy to construct a novel ontology aims at classifying and evaluating the impact of STM scholarly articles. First, we explain how the Kuhnian cycle of science describes research at different epistemic stages. Second, we show how the Kuhnian cycle could be reconstructed into modular ontologies which classify scholarly articles according to their contribution to paradigm-centred knowledge. The proposed ontology and its scenarios are discussed. To the best of the authors knowledge, this is the first attempt for creating an ontology for describing scholarly articles based on the Kuhnian paradigmatic view of science.
Beyond trust: Why we need a paradigm shift in data-sharing
In parallel with the progressing digitalization of almost every area of life, artificial intelligence (AI) and analytics capabilities grew tremendously, enabling companies to transform random data trails into meaningful insights that helped them greatly improve business processes. Targeted marketing, location-based searches and personalized promotions became the name of the game. This eventually led to the ability to combine data from various sources into large datasets, and to mine them for granular user profiles of unprecedented detail in order to establish correlations between disparate aspects of consumer behaviour, making individual health risks and electoral choices ever more predictable – for those who held the data.
What is a Lakehouse? - The Databricks Blog
Over the past few years at Databricks, we've seen a new data management paradigm that emerged independently across many customers and use cases: the lakehouse. In this post we describe this new paradigm and its advantages over previous approaches. Data warehouses have a long history in decision support and business intelligence applications. Since its inception in the late 1980s, data warehouse technology continued to evolve and MPP architectures led to systems that were able to handle larger data sizes. But while warehouses were great for structured data, a lot of modern enterprises have to deal with unstructured data, semi-structured data, and data with high variety, velocity, and volume.
UK Introduces New Fast-Track Visa to Attract Scientists
British Prime Minister Boris Johnson introduced a new fast-track visa to attract more of the world's best scientists to the U.K. in hopes of creating a global science "superpower." Johnson paired the announcement of the Global Talent route program with a pledge of 300 million pounds ($392 million) for research into advanced mathematics. The money will help fund researchers and doctoral students whose work in math underpins myriad developments such as safer air travel, smart phone technology and artificial intelligence. The new visa route will have no cap on the number of people able to come to the U.K. under the program. "The UK has a proud history of scientific discovery, but to lead the field and face the challenges of the future we need to continue to invest in talent and cutting edge research,'' Johnson said in a statement.
AlphaFold: Using AI for scientific discovery
The recipes for those proteins--called genes--are encoded in our DNA. An error in the genetic recipe may result in a malformed protein, which could result in disease or death for an organism. Many diseases, therefore, are fundamentally linked to proteins. But just because you know the genetic recipe for a protein doesn't mean you automatically know its shape. Proteins are comprised of chains of amino acids (also referred to as amino acid residues). But DNA only contains information about the sequence of amino acids–not how they fold into shape.