scientific discovery


Robust Hypothesis Testing Using Wasserstein Uncertainty Sets

Neural Information Processing Systems

We develop a novel computationally efficient and general framework for robust hypothesis testing. The new framework features a new way to construct uncertainty sets under the null and the alternative distributions, which are sets centered around the empirical distribution defined via Wasserstein metric, thus our approach is data-driven and free of distributional assumptions. We develop a convex safe approximation of the minimax formulation and show that such approximation renders a nearly-optimal detector among the family of all possible tests. By exploiting the structure of the least favorable distribution, we also develop a tractable reformulation of such approximation, with complexity independent of the dimension of observation space and can be nearly sample-size-independent in general. Real-data example using human activity data demonstrated the excellent performance of the new robust detector.


Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

Neural Information Processing Systems

Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty' associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values.


Adaptive Active Hypothesis Testing under Limited Information

Neural Information Processing Systems

We consider the problem of active sequential hypothesis testing where a Bayesian decision maker must infer the true hypothesis from a set of hypotheses. The decision maker may choose for a set of actions, where the outcome of an action is corrupted by independent noise. In this paper we consider a special case where the decision maker has limited knowledge about the distribution of observations for each action, in that only a binary value is observed. Our objective is to infer the true hypothesis with low error, while minimizing the number of action sampled. Our main results include the derivation of a lower bound on sample size for our system under limited knowledge and the design of an active learning policy that matches this lower bound and outperforms similar known algorithms.


Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease

Neural Information Processing Systems

Our goal is to perform a statistical test checking if $P_{\rm source}$ $P_{\rm target}$ while removing the distortions induced by the transformations. This problem is closely related to concepts underlying numerous domain adaptation algorithms, and in our case, is motivated by the need to combine clinical and imaging based biomarkers from multiple sites and/or batches, where this problem is fairly common and an impediment in the conduct of analyses with much larger sample sizes. We develop a framework that addresses this problem using ideas from hypothesis testing on the transformed measurements, where in the distortions need to be estimated {\it in tandem} with the testing. We derive a simple algorithm and study its convergence and consistency properties in detail, and we also provide lower-bound strategies based on recent work in continuous optimization. On a dataset of individuals at risk for neurological disease, our results are competitive with alternative procedures that are twice as expensive and in some cases operationally infeasible to implement.


Beyond trust: Why we need a paradigm shift in data-sharing

#artificialintelligence

In parallel with the progressing digitalization of almost every area of life, artificial intelligence (AI) and analytics capabilities grew tremendously, enabling companies to transform random data trails into meaningful insights that helped them greatly improve business processes. Targeted marketing, location-based searches and personalized promotions became the name of the game. This eventually led to the ability to combine data from various sources into large datasets, and to mine them for granular user profiles of unprecedented detail in order to establish correlations between disparate aspects of consumer behaviour, making individual health risks and electoral choices ever more predictable – for those who held the data.


UK Introduces New Fast-Track Visa to Attract Scientists

#artificialintelligence

British Prime Minister Boris Johnson introduced a new fast-track visa to attract more of the world's best scientists to the U.K. in hopes of creating a global science "superpower." Johnson paired the announcement of the Global Talent route program with a pledge of 300 million pounds ($392 million) for research into advanced mathematics. The money will help fund researchers and doctoral students whose work in math underpins myriad developments such as safer air travel, smart phone technology and artificial intelligence. The new visa route will have no cap on the number of people able to come to the U.K. under the program. "The UK has a proud history of scientific discovery, but to lead the field and face the challenges of the future we need to continue to invest in talent and cutting edge research,'' Johnson said in a statement.


AlphaFold: Using AI for scientific discovery

#artificialintelligence

The recipes for those proteins--called genes--are encoded in our DNA. An error in the genetic recipe may result in a malformed protein, which could result in disease or death for an organism. Many diseases, therefore, are fundamentally linked to proteins. But just because you know the genetic recipe for a protein doesn't mean you automatically know its shape. Proteins are comprised of chains of amino acids (also referred to as amino acid residues).


Key Trends in AI-Driven Fintech: The New Paradigm

#artificialintelligence

Technology is reshaping the operating-model of financial institutions fundamentally, and the attributes necessary to build a successful business. AI is weakening various components of incumbent financial institutions, thereby creating an opportunity for an entirely new operating-models and category-dynamics focused on the scale and sophistication of product, tech & data much more than the scale or complexity of capital. Unlike past'AI Springs', the science and practice of AI is poised to continue an unprecedented multi-decade run of progress. A clear vision of the future financial landscape is critical for good governance and strategic decisions. AI systems will eventually underwrite credit and insurance across the world.


Nonzero-sum Adversarial Hypothesis Testing Games

Neural Information Processing Systems

We study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria. Our main results are on the exponential rates of convergence of classification errors at equilibrium, which are analogous to the well-known Chernoff-Stein lemma and Chernoff information that describe the error exponents in the classical binary hypothesis testing problem, but with parameters derived from the adversarial model. The results are validated through numerical experiments. Papers published at the Neural Information Processing Systems Conference.


The New-Paradigm: Key Trends in AI-Driven Fintech

#artificialintelligence

Technology is reshaping the operating-model of financial institutions fundamentally, and the attributes necessary to build a successful business. AI is weakening various components of incumbent financial institutions, thereby creating an opportunity for an entirely new operating-models and category-dynamics focused on the scale and sophistication of product, tech & data much more than the scale or complexity of capital. Unlike past'AI Springs', the science and practice of AI is poised to continue an unprecedented multi-decade run of progress. A clear vision of the future financial landscape is critical for good governance and strategic decisions. AI systems will eventually underwrite credit and insurance across the world.