condition
- North America > United States > Texas > Brazos County > College Station (0.14)
- Asia > China > Guangdong Province > Zhuhai (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
On the Optimality of Tracking Fisher Information in Adaptive Testing with Stochastic Binary Responses
Kim, Sanghwa, Ahn, Dohyun, Min, Seungki
Adaptive testing and sequential estimation problems have recently gained substantial attention due to their foundational role in modern artificial intelligence and interactive systems. Prominent applications include online preference learning, where systems dynamically adapt to user feedback to refine personalized recommendations, and reinforcement learning from human feedback (RLHF), which aims to align AI agents with human values by adaptively querying users. In these contexts, the main focus is to efficiently extract maximal information from human responses, which are inherently stochastic and limited in quantity. Among various types of such problems, this work particularly considers a fundamental yet illustrative case involving stochastic binary responses. Here, a decision-maker sequentially selects questions of varying difficulty from a continuous pool to pose to a candidate and aims to efficiently estimate the candidate's ability (represented by an unknown continuous parameter) by utilizing the binary feedback (e.g., correct/incorrect) collected, which depends probabilistically on the candidate's ability and the question's difficulty. This setup is arguably the simplest scenario that captures the essence of continuous parameter estimation under uncertainty, making it an ideal benchmark for developing fundamental theoretical insights and practical algorithms. Variants of this fundamental adaptive estimation problem have been studied in several communities.
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
A more efficient method for large-sample model-free feature screening via multi-armed bandits
Ouyang, Xiaxue, Kang, Xinlai, Li, Mengyu, Dou, Zhenxing, Yu, Jun, Meng, Cheng
We consider the model-free feature screening in large-scale ultrahigh-dimensional data analysis. Existing feature screening methods often face substantial computational challenges when dealing with large sample sizes. To alleviate the computational burden, we propose a rank-based model-free sure independence screening method (CR-SIS) and its efficient variant, BanditCR-SIS. The CR-SIS method, based on Chatterjee's rank correlation, is as straightforward to implement as the sure independence screening (SIS) method based on Pearson correlation introduced by Fan and Lv(2008), but it is significantly more powerful in detecting nonlinear relationships between variables. Motivated by the multi-armed bandit (MAB) problem, we reformulate the feature screening procedure to significantly reduce the computational complexity of CR-SIS. For a predictor matrix of size n \times p, the computational cost of CR-SIS is O(nlog(n)p), while BanditCR-SIS reduces this to O(\sqrt(n)log(n)p + nlog(n)). Theoretically, we establish the sure screening property for both CR-SIS and BanditCR-SIS under mild regularity conditions. Furthermore, we demonstrate the effectiveness of our methods through extensive experimental studies on both synthetic and real-world datasets. The results highlight their superior performance compared to classical screening methods, requiring significantly less computational time.
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.48)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities
Carpentier, Alexandra, Giraud, Christophe, Verzelen, Nicolas
Predictions from statistical physics postulate that recovery of the communities in Stochastic Block Model (SBM) is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that non-trivial community recovery is indeed possible in SBM above the KS threshold, as long as the number $K$ of communities remains smaller than $\sqrt{n}$, where $n$ is the number of nodes in the observed graph. Failure of low-degree polynomials below the KS threshold was also proven when $K=o(\sqrt{n})$. When $K\geq \sqrt{n}$, Chin et al.(2025) recently prove that, in a sparse regime, community recovery in polynomial time is possible below the KS threshold by counting non-backtracking paths. This breakthrough result lead them to postulate a new threshold for the many communities regime $K\geq \sqrt{n}$. In this work, we provide evidences that confirm their conjecture for $K\geq \sqrt{n}$: 1- We prove that, for any density of the graph, low-degree polynomials fail to recover communities below the threshold postulated by Chin et al.(2025); 2- We prove that community recovery is possible in polynomial time above the postulated threshold, not only in the sparse regime of~Chin et al., but also in some (but not all) moderately sparse regimes by essentially counting clique occurence in the observed graph.
- Europe > Germany > Brandenburg > Potsdam (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
Human Machine Co-Adaptation Model and Its Convergence Analysis
Su, Steven W., Li, Yaqi, Guo, Kairui, Duffield, Rob
The key to robot-assisted rehabilitation lies in the design of the human-machine interface, which must accommodate the needs of both patients and machines. Current interface designs primarily focus on machine control algorithms, often requiring patients to spend considerable time adapting. In this paper, we introduce a novel approach based on the Cooperative Adaptive Markov Decision Process (CAMDPs) model to address the fundamental aspects of the interactive learning process, offering theoretical insights and practical guidance. We establish sufficient conditions for the convergence of CAMDPs and ensure the uniqueness of Nash equilibrium points. Leveraging these conditions, we guarantee the system's convergence to a unique Nash equilibrium point. Furthermore, we explore scenarios with multiple Nash equilibrium points, devising strategies to adjust both Value Evaluation and Policy Improvement algorithms to enhance the likelihood of converging to the global minimal Nash equilibrium point. Through numerical experiments, we illustrate the effectiveness of the proposed conditions and algorithms, demonstrating their applicability and robustness in practical settings. The proposed conditions for convergence and the identification of a unique optimal Nash equilibrium contribute to the development of more effective adaptive systems for human users in robot-assisted rehabilitation.
- Health & Medicine (1.00)
- Education > Educational Setting > Online (0.34)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Comparing regularisation paths of (conjugate) gradient estimators in ridge regression
Hucker, Laura, Reiß, Markus, Stark, Thomas
We consider standard gradient descent, gradient flow and conjugate gradients as iterative algorithms for minimizing a penalized ridge criterion in linear regression. While it is well known that conjugate gradients exhibit fast numerical convergence, the statistical properties of their iterates are more difficult to assess due to inherent nonlinearities and dependencies. On the other hand, standard gradient flow is a linear method with well known regularizing properties when stopped early. By an explicit non-standard error decomposition we are able to bound the prediction error for conjugate gradient iterates by a corresponding prediction error of gradient flow at transformed iteration indices. This way, the risk along the entire regularisation path of conjugate gradient iterations can be compared to that for regularisation paths of standard linear methods like gradient flow and ridge regression. In particular, the oracle conjugate gradient iterate shares the optimality properties of the gradient flow and ridge regression oracles up to a constant factor. Numerical examples show the similarity of the regularisation paths in practice.
Reviews: Learning Signed Determinantal Point Processes through the Principal Minor Assignment Problem
The authors' response was in many respects quite comprehensive so I am inclined to slightly revise my score. As I said, I think the results presented in the paper seem interesting and novel, however I still feel that the motivation for signed DPP's is not sufficiently studied. The example of coffee, tea and mugs is nice, but there is just not enough concrete evidence in the current discussion suggesting that the signed DPP would even do the right thing in this simple case (I'm not saying that it wouldn't, just that it was not scientifically established in any way). The authors first define the generalized DPP and then discuss the challenges that the non-symmetric DPP poses for the task of learning of a kernel matrix from i.i.d samples when using the method of moments from prior work [23]. Then, under various assumptions on the nonsymmetric kernel matrix, a learning algorithm is proposed which runs in polynomial time (the analysis follows the ideas of [23], but addresses the challenges posed by the non-symmetric nature of the kernel).
1 Introduction and Related Work
Example-based explanations are widely used in the effort to improve the interpretability of highly complex distributions. However, prototypes alone are rarely sufficient to represent the gist of the complexity. In order for users to construct better mental models and understand complex data distributions, we also need criticism to explain what are not captured by prototypes. Motivated by the Bayesian model criticism framework, we develop MMD-critic which efficiently learns prototypes and criticism, designed to aid human interpretability. A human subject pilot study shows that the MMD-critic selects prototypes and criticism that are useful to facilitate human understanding and reasoning. We also evaluate the prototypes selected by MMD-critic via a nearest prototype classifier, showing competitive performance compared to baselines.
- Oceania > New Zealand > South Island > Marlborough District > Blenheim (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Illinois (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Monotone Learning
Bousquet, Olivier, Daniely, Amit, Kaplan, Haim, Mansour, Yishay, Moran, Shay, Stemmer, Uri
The amount of training-data is one of the key factors which determines the generalization capacity of learning algorithms. Intuitively, one expects the error rate to decrease as the amount of training-data increases. Perhaps surprisingly, natural attempts to formalize this intuition give rise to interesting and challenging mathematical questions. For example, in their classical book on pattern recognition, Devroye, Gyorfi, and Lugosi (1996) ask whether there exists a {monotone} Bayes-consistent algorithm. This question remained open for over 25 years, until recently Pestov (2021) resolved it for binary classification, using an intricate construction of a monotone Bayes-consistent algorithm. We derive a general result in multiclass classification, showing that every learning algorithm A can be transformed to a monotone one with similar performance. Further, the transformation is efficient and only uses a black-box oracle access to A. This demonstrates that one can provably avoid non-monotonic behaviour without compromising performance, thus answering questions asked by Devroye et al (1996), Viering, Mey, and Loog (2019), Viering and Loog (2021), and by Mhammedi (2021). Our transformation readily implies monotone learners in a variety of contexts: for example it extends Pestov's result to classification tasks with an arbitrary number of labels. This is in contrast with Pestov's work which is tailored to binary classification. In addition, we provide uniform bounds on the error of the monotone algorithm. This makes our transformation applicable in distribution-free settings. For example, in PAC learning it implies that every learnable class admits a monotone PAC learner. This resolves questions by Viering, Mey, and Loog (2019); Viering and Loog (2021); Mhammedi (2021).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Virginia (0.04)
- Europe > Italy (0.04)
New Hard-thresholding Rules based on Data Splitting in High-dimensional Imbalanced Classification
Mojiri, Arezou, Khalili, Abbas, Hamadani, Ali Zeinal
In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification frequently arises in applications such as biology, medicine, engineering, and social sciences. In this paper, for the first time, we theoretically study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, and high-dimensionality of the feature space, the LDA ignores the minority class yielding a maximum misclassification rate. We then propose a new construction of hard-thresholding rules based on a data splitting technique that reduces the large difference between the misclassification rates. We show that the proposed method is asymptotically optimal. We further study two well-known sparse versions of the LDA in imbalanced cases. We evaluate the finite-sample performance of different methods using simulations and by analyzing two real data sets. The results show that our method either outperforms its competitors or has comparable performance based on a much smaller subset of selected features, while being computationally more efficient.
- North America > United States (0.14)
- North America > Canada > Quebec > Montreal (0.04)