Goto

Collaborating Authors

 critical threshold


Hypothesis Testing for Generalized Thurstone Models

Makur, Anuran, Singh, Japneet

arXiv.org Machine Learning

In this work, we develop a hypothesis testing framework to determine whether pairwise comparison data is generated by an underlying \emph{generalized Thurstone model} $\mathcal{T}_F$ for a given choice function $F$. While prior work has predominantly focused on parameter estimation and uncertainty quantification for such models, we address the fundamental problem of minimax hypothesis testing for $\mathcal{T}_F$ models. We formulate this testing problem by introducing a notion of separation distance between general pairwise comparison models and the class of $\mathcal{T}_F$ models. We then derive upper and lower bounds on the critical threshold for testing that depend on the topology of the observation graph. For the special case of complete observation graphs, this threshold scales as $Θ((nk)^{-1/2})$, where $n$ is the number of agents and $k$ is the number of comparisons per pair. Furthermore, we propose a hypothesis test based on our separation distance, construct confidence intervals, establish time-uniform bounds on the probabilities of type I and II errors using reverse martingale techniques, and derive minimax lower bounds using information-theoretic methods. Finally, we validate our results through experiments on synthetic and real-world datasets.


When Data Falls Short: Grokking Below the Critical Threshold

Singh, Vaibhav, Belilovsky, Eugene, Aljundi, Rahaf

arXiv.org Artificial Intelligence

In this paper, we investigate the phenomenon of grokking, where models exhibit delayed generalization following overfitting on training data. We focus on data-scarce regimes where the number of training samples falls below the critical threshold, making grokking unobservable, and on practical scenarios involving distribution shift. We first show that Knowledge Distillation (KD) from a model that has already grokked on a distribution (p1) can induce and accelerate grokking on a different distribution (p2), even when the available data lies below the critical threshold. This highlights the value of KD for deployed models that must adapt to new distributions under limited data. We then study training on the joint distribution (p1, p2) and demonstrate that while standard supervised training fails when either distribution has insufficient data, distilling from models grokked on the individual distributions enables generalization. Finally, we examine a continual pretraining setup, where a grokked model transitions from p1 to p2, and find that KD both accelerates generalization and mitigates catastrophic forgetting, achieving strong performance even with only 10% of the data. Together, our results provide new insights into the mechanics of grokking under knowledge transfer and underscore the central role of KD in enabling generalization in low-data and evolving distribution settings.


Siamese Neural Network for Label-Efficient Critical Phenomena Prediction in 3D Percolation Models

Wang, Shanshan, Xu, Dian, Shen, Jianmin, Gao, Feng, Li, Wei, Deng, Weibing

arXiv.org Artificial Intelligence

Percolation theory serves as a cornerstone for studying phase transitions and critical phenomena, with broad implications in statistical physics, materials science, and complex networks. However, most machine learning frameworks for percolation analysis have focused on two-dimensional systems, oversimplifying the spatial correlations and morphological complexity of real-world three-dimensional materials. To bridge this gap and improve label efficiency and scalability in 3D systems, we propose a Siamese Neural Network (SNN) that leverages features of the largest cluster as discriminative input. Our method achieves high predictive accuracy for both site and bond percolation thresholds and critical exponents in three dimensions, with sub-1% error margins using significantly fewer labeled samples than traditional approaches. This work establishes a robust and data-efficient framework for modeling high-dimensional critical phenomena, with potential applications in materials discovery and complex network analysis.


Minimax Hypothesis Testing for the Bradley-Terry-Luce Model

Makur, Anuran, Singh, Japneet

arXiv.org Artificial Intelligence

The Bradley-Terry-Luce (BTL) model is one of the most widely used models for ranking a collection of items or agents based on pairwise comparisons among them. In this work, our objective is to formulate a hypothesis test that determines whether a given pairwise comparison dataset, with k comparisons per pair of agents, originates from an underlying BTL model. We formalize this testing problem in the minimax sense and define the critical threshold of the problem. We then establish upper bounds on the critical threshold for general induced observation graphs (satisfying mild assumptions) and develop lower bounds for complete induced graphs. In particular, our test statistic for the upper bounds is based on a new approximation we derive for the separation distance between general pairwise comparison models and the class of BTL models. To further assess the performance of our statistical test, we prove upper bounds on the type I and type II probabilities of error. Much of our analysis is conducted within the context of a fixed observation graph structure, where the graph possesses certain "nice" properties, such as expansion and bounded principal ratio. Finally, we conduct several experiments on synthetic and real-world datasets to validate some of our theoretical results. Moreover, we also propose an approach based on permutation testing to determine the threshold of our test in a data-driven manner in these experiments. In recent years, the availability of pairwise comparison data and its subsequent analysis has significantly increased across diverse domains. Pairwise comparison data consists of information gathered in the form of comparisons made among a given set of items or agents. Many real-world applications, including sports tournaments, consumer preference surveys, and political voting, generate data in the form of pairwise comparisons. Such datasets serve a range of purposes, such as ranking items [2]-[12], analyzing team performance over time [13], studying market or sports competitiveness [14], [15], and even fine-tuning large language models using reinforcement learning from human feedback [16], [17]. A popular modeling assumption while performing such learning and inference tasks with pairwise comparison data is to assume that the data conforms to an underlying Bradley-Terry-Luce (BTL) model [2]-[6] as a generative model for the data. P(i is preferred over j) = . The BTL model is known to be a natural consequence of the assumption of independence of irrelevant alternatives (IIA), which is widely used in economics and social choice theory [3].


Climate change boosts risk of explosive wildfire growth in California by 25%, study says

Los Angeles Times

Climate change has ratcheted up the risk of explosive wildfire growth in California by 25% and will continue to drive extreme fire behavior for decades to come, even if planet-warming emissions are reduced, a new study has found. "Emissions reductions have a minimal impact on wildfire danger in the near term -- the next several decades," said author Patrick T. Brown, co-director of the climate and energy team at the Breakthrough Institute, a Berkeley-based think tank. "So it's important to look at more direct on-the-ground solutions to the problem like fuel reduction." Although previous studies have looked at the impact of climate change on broader metrics like annual area burned, as well on conditions that are conducive to wildfires, like aridity, the research published Wednesday in Nature drills down on how rising temperatures affected individual fires, and how they might continue to do so in the future. The researchers analyzed nearly 18,000 fires that ignited in California between 2003 and 2020.


Cooperative Simultaneous Tracking and Jamming for Disabling a Rogue Drone

Papaioannou, Savvas, Kolios, Panayiotis, Panayiotou, Christos G., Polycarpou, Marios M.

arXiv.org Artificial Intelligence

This work investigates the problem of simultaneous tracking and jamming of a rogue drone in 3D space with a team of cooperative unmanned aerial vehicles (UAVs). We propose a decentralized estimation, decision and control framework in which a team of UAVs cooperate in order to a) optimally choose their mobility control actions that result in accurate target tracking and b) select the desired transmit power levels which cause uninterrupted radio jamming and thus ultimately disrupt the operation of the rogue drone. The proposed decision and control framework allows the UAVs to reconfigure themselves in 3D space such that the cooperative simultaneous tracking and jamming (CSTJ) objective is achieved; while at the same time ensures that the unwanted inter-UAV jamming interference caused during CSTJ is kept below a specified critical threshold. Finally, we formulate this problem under challenging conditions i.e., uncertain dynamics, noisy measurements and false alarms. Extensive simulation experiments illustrate the performance of the proposed approach.


The Lack of Women Data Scientists Hurts Artificial Intelligence - Ms. Magazine

#artificialintelligence

New advancements in data science often spark dire predictions about how powerful new technologies will transform the world. Yet, as writer Stephen Shankland reminds us, technologies like Open AI's new Chat GPT (short for chat-based Generative Pretrained Transformer) are created by humans. Chat GPT is a chatbot that is "trained with human assistance to deliver more useful, better dialog." The people assisting that training--those who create the models and assemble the data used to train chatbots--make a difference in the technologies that will go on to shape our lives. Computer scientist Joy Buolamwini, an early critic of racial bias in facial recognition software, said technology should "be more attuned to the people who use it and the people it's used on."


The Math of the Amazing Sandpile - Issue 107: The Edge

Nautilus

One country going Communist was supposed to topple the next, and then the next, and the next. The metaphor drove much of United States foreign policy in the middle of the 20th century. But it had the wrong name. From a physical point of view, it should have been called the "sandpile theory." Real-world political phase transitions tend to happen not in neat sequences, but in sudden coordinated fits, like the Arab Spring, or the collapse of the Eastern Bloc.

  Country:
  Industry: Government (0.34)

A Hybrid Deep Learning Model for Predictive Flood Warning and Situation Awareness using Channel Network Sensors Data

Dong, Shangjia, Yu, Tianbo, Farahmand, Hamed, Mostafavi, Ali

arXiv.org Machine Learning

The objective of this study is to create and test a hybrid deep learning model, FastGRNN-FCN (Fast, Accurate, Stable and Tiny Gated Recurrent Neural Network-Fully Convolutional Network), for urban flood prediction and situation awareness using channel network sensors data. The study used Harris County, Texas as the testbed, and obtained channel sensor data from three historical flood events (e.g., 2016 Tax Day Flood, 2016 Memorial Day flood, and 2017 Hurricane Harvey Flood) for training and validating the hybrid deep learning model. The flood data are divided into a multivariate time series and used as the model input. Each input comprises nine variables, including information of the studied channel sensor and its predecessor and successor sensors in the channel network. Precision-recall curve and F-measure are used to identify the optimal set of model parameters. The optimal model with a weight of 1 and a critical threshold of 0.59 are obtained through one hundred iterations based on examining different weights and thresholds. The test accuracy and F-measure eventually reach 97.8% and 0.792, respectively. The model is then tested in predicting the 2019 Imelda flood in Houston and the results show an excellent match with the empirical flood. The results show that the model enables accurate prediction of the spatial-temporal flood propagation and recession and provides emergency response officials with a predictive flood warning tool for prioritizing the flood response and resource allocation strategies.


Network driven sampling; a critical threshold for design effects

Rohe, Karl

arXiv.org Machine Learning

Web crawling, snowball sampling, and respondent-driven sampling (RDS) are three types of network sampling techniques used to contact individuals in hard-to-reach populations. This paper studies these procedures as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree (instead of a chain) allows for the sampled units to refer multiple future units into the sample. In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is some value $DE$, then constructing an estimator from the novel design makes the variance of the estimator $DE$ times greater than it would be under a simple random sample with the same sample size $n$. Under certain assumptions on the referral tree, the design effect of network sampling has a critical threshold that is a function of the referral rate $m$ and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix, $\lambda_2$. If $m < 1/\lambda_2^2$, then the design effect is finite (i.e. the standard estimator is $\sqrt{n}$-consistent). However, if $m > 1/\lambda_2^2$, then the design effect grows with $n$ (i.e. the standard estimator is no longer $\sqrt{n}$-consistent). Past this critical threshold, the standard error of the estimator converges at the slower rate of $n^{\log_m \lambda_2}$. The Markov model allows for nodes to be resampled; computational results show that the findings hold in without-replacement sampling. To estimate confidence intervals that adapt to the correct level of uncertainty, a novel resampling procedure is proposed. Computational experiments compare this procedure to previous techniques.