Goto

Collaborating Authors

 Scientific Discovery


1,000-year-old medieval sword emerges from Dutch river after chance discovery: 'Barely corroded'

FOX News

SOLVA Archaeology Service in Belgium announced the recent discovery of ancient Roman artifacts and remains, including a well-preserved dog, in Velzeke. A remarkable medieval sword with rare symbols was recently put on display in a Dutch museum, over a year after it was found by construction workers unexpectedly. The discovery of the sword was announced by the Netherlands' National Museum of Antiquities (RMO) in Leiden on June 24. The artifact, named the Linschoten Sword, was found in March 2024 during "maintenance dredging activities," the museum said in a press release. Construction workers were struck by a "long piece of iron" while cleaning a small river known as the Korte Linschoten, the statement noted.


Ultra-rare first edition book from Galileo heading to auction

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. A small library's worth of rare medieval and Renaissance books are heading to auction on July 9. The expansive lot includes a portable Magna Carta, an early scientific encyclopedia, a surgical codex, and one of the oldest surviving Sephardic Torah scrolls. But according to Christies's Auction House, one manuscript is the first of its kind to go up for sale in over a century: a copy of the first pseudonymous astronomical text co-written by Galileo Galilei. The evening of October 9, 1604, offered an unexpected and ultimately revolutionary moment for astronomy.


Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease

Neural Information Processing Systems

We only observe their transformed versions h(\mathbf{x_s i}) and g(\mathbf{x_t i}), for some known function class h(\cdot) and g(\cdot) . Our goal is to perform a statistical test checking if P_{\rm source} P_{\rm target} while removing the distortions induced by the transformations. This problem is closely related to concepts underlying numerous domain adaptation algorithms, and in our case, is motivated by the need to combine clinical and imaging based biomarkers from multiple sites and/or batches, where this problem is fairly common and an impediment in the conduct of analyses with much larger sample sizes. We develop a framework that addresses this problem using ideas from hypothesis testing on the transformed measurements, where in the distortions need to be estimated {\it in tandem} with the testing. We derive a simple algorithm and study its convergence and consistency properties in detail, and we also provide lower-bound strategies based on recent work in continuous optimization.


Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease

Neural Information Processing Systems

This problem is closely related to domain adaptation, and in our case, is motivated by the need to combine clinical and imaging based biomarkers from multiple sites and/or batches - a fairly common impediment in conducting analyses with much larger sample sizes. We address this problem using ideas from hypothesis testing on the transformed measurements, wherein the distortions need to be estimated in tandem with the testing. We derive a simple algorithm and study its convergence and consistency properties in detail, and provide lower-bound strategies based on recent work in continuous optimization. On a dataset of individuals at risk for Alzheimer's disease, our framework is competitive with alternative procedures that are twice as expensive and in some cases operationally infeasible to implement.


Statistical Inference for Pairwise Graphical Models Using Score Matching

Neural Information Processing Systems

Probabilistic graphical models have been widely used to model complex systems and aid scientific discoveries. As a result, there is a large body of literature focused on consistent model selection. However, scientists are often interested in understanding uncertainty associated with the estimated parameters, which current literature has not addressed thoroughly. In this paper, we propose a novel estimator for edge parameters for pairwise graphical models based on Hyvรคrinen scoring rule. Hyvรคrinen scoring rule is especially useful in cases where the normalizing constant cannot be obtained efficiently in a closed form.


Nonzero-sum Adversarial Hypothesis Testing Games

Neural Information Processing Systems

We study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria. Our main results are on the exponential rates of convergence of classification errors at equilibrium, which are analogous to the well-known Chernoff-Stein lemma and Chernoff information that describe the error exponents in the classical binary hypothesis testing problem, but with parameters derived from the adversarial model. The results are validated through numerical experiments.


ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time

Neural Information Processing Systems

Humans have the remarkable ability to recognize and acquire novel visual concepts in a zero-shot manner. Given a high-level, symbolic description of a novel concept in terms of previously learned visual concepts and their relations, humans can recognize novel concepts without seeing any examples. Moreover, they can acquire new concepts by parsing and communicating symbolic structures using learned visual concepts and relations. Endowing these capabilities in machines is pivotal in improving their generalization capability at inference time. In this work, we introduce Zero-shot Concept Recognition and Acquisition (ZeroC), a neuro-symbolic architecture that can recognize and acquire novel concepts in a zero-shot way.


A unified framework for bandit multiple testing

Neural Information Processing Systems

In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that we wish to test, and the goal is to design adaptive algorithms that correctly identify large set of interesting arms (true discoveries), while only mistakenly identifying a few uninteresting ones (false discoveries). One common metric in non-bandit multiple testing is the false discovery rate (FDR). We propose a unified, modular framework for bandit FDR control that emphasizes the decoupling of exploration and summarization of evidence. We utilize the powerful martingale-based concept of "e-processes" to ensure FDR control for arbitrary composite nulls, exploration rules and stopping times in generic problem settings. In particular, valid FDR control holds even if the reward distributions of the arms could be dependent, multiple arms may be queried simultaneously, and multiple (cooperating or competing) agents may be querying arms, covering combinatorial semi-bandit type settings as well. Prior work has considered in great detail the setting where each arm's reward distribution is independent and sub-Gaussian, and a single arm is queried at each step. Our framework recovers matching sample complexity guarantees in this special case, and performs comparably or better in practice. For other settings, sample complexities will depend on the finer details of the problem (composite nulls being tested, exploration algorithm, data dependence structure, stopping rule) and we do not explore these; our contribution is to show that the FDR guarantee is clean and entirely agnostic to these details.


Greedy Approximation Algorithms for Active Sequential Hypothesis Testing

Neural Information Processing Systems

In the problem of active sequential hypothesis testing (ASHT), a learner seeks to identify the true hypothesis from among a known set of hypotheses. The learner is given a set of actions and knows the random distribution of the outcome of any action under any true hypothesis. Given a target error >0, the goal is to sequentially select the fewest number of actions so as to identify the true hypothesis with probability at least 1. Motivated by applications in which the number of hypotheses or actions is massive (e.g., genomics-based cancer detection), we propose efficient (greedy, in fact) algorithms and provide the first approximation guarantees for ASHT, under two types of adaptivity. Both of our guarantees are independent of the number of actions and logarithmic in the number of hypotheses. We numerically evaluate the performance of our algorithms using both synthetic and real-world DNA mutation data, demonstrating that our algorithms outperform previously proposed heuristic policies by large margins.