identification problem
Maximizing and Satisficing in Multi-armed Bandits with Graph Information
Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision making and search under uncertainty. In modern applications however, one is often faced with a tremendously large number of options and even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similarity relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms is captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one that has sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms GRUB (GRaph based UcB) and ฮถ-GRUB for these problems and provide theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement that shows a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.
Threshold-Based Optimal Arm Selection in Monotonic Bandits: Regret Lower Bounds and Algorithms
Varude, Chanakya, Chaudhary, Jay, Kaushik, Siddharth, Chaporkar, Prasanna
In multi-armed bandit problems, the typical goal is to identify the arm with the highest reward. This paper explores a threshold-based bandit problem, aiming to select an arm based on its relation to a prescribed threshold \(ฯ\). We study variants where the optimal arm is the first above \(ฯ\), the \(k^{th}\) arm above or below it, or the closest to it, under a monotonic structure of arm means. We derive asymptotic regret lower bounds, showing dependence only on arms adjacent to \(ฯ\). Motivated by applications in communication networks (CQI allocation), clinical dosing, energy management, recommendation systems, and more. We propose algorithms with optimality validated through Monte Carlo simulations. Our work extends classical bandit theory with threshold constraints for efficient decision-making.
Efficient identification of linear, parameter-varying, and nonlinear systems with noise models
Bemporad, Alberto, Tรณth, Roland
However, if one goes beyond the well-established linear time-invariant (L TI) system identification framework [17], considering, e.g., linear time-varying (L TV) [22], linear parameter-varying (LPV) [29], or various nonlinear (NL) models [11], a core question of the resulting identification problems is how to efficiently estimate from data the functional relations involved in these models. While classical beyond-LTI methods used fixed basis-function parameteriza-tions to express nonlinearities, with the recent progress of computational capabilities and machine learning/deep learning methods, a wide-range of learning approaches has been introduced, from Gaussian process (GP) regression [24] and support vector machines [27] to artificial neural networks (ANN) based training (see [23] for a recent overview), including state-space (SS) recurrent networks such as LSTMs [13, 19], SS-ANNs [1-3, 10, 20, 25, 28], etc. In particular, the latter class of methods successfully combines deep learning with state-space model identification concepts, achieving extreme accuracy levels on well established benchmarks, see [1]. However, in the quest of solving the underlying function estimation problem with improved accuracy and reliability, many aspects of the classical identification theory have been sacrificed. While deep-learning methods can achieve remarkable results, the relating training time when no state-measurements are available is often too high compared to classical identification methods, requiring tens of hours, sometimes days, with the most sophisticated stochastic gradient methods, like Adam [15], to train mid-sized models on training sequences of reasonable length. Using such methods in an identification cycle, which requires iterations on model structure selection, hyperparameter optimization on validation data, and experiment design, is simply impractical. Next to the computational challenges, many of the developed machine learning-based identification methods do not consider measurement or process noise affecting the estimation problem, or are based on simplistic noise settings / isolated scenarios, with a few exceptions, e.g., [1, 25]. The general understanding of process and noise models, and the related concept of one/ n -step-ahead predictors and connected modelling choices, model structures, and the overall estimation concept, which are well-developed for the L TI case, is largely missing in the general NL context. Besides theoretical limitations, in practice it is not clear to the user how to co-estimate linear and nonlinear elements, select a combination of process and noise models, and scale up the identification process from linear to nonlinear modeling within the same framework.
Functional multi-armed bandit and the best function identification problems
Dorn, Yuriy, Katrutsa, Aleksandr, Latypov, Ilgam, Soboleva, Anastasiia
Bandit optimization usually refers to the class of online optimization problems with limited feedback, namely, a decision maker uses only the objective value at the current point to make a new decision and does not have access to the gradient of the objective function. While this name accurately captures the limitation in feedback, it is somehow misleading since it does not have any connection with the multi-armed bandits (MAB) problem class. We propose two new classes of problems: the functional multi-armed bandit problem (FMAB) and the best function identification problem. They are modifications of a multi-armed bandit problem and the best arm identification problem, respectively, where each arm represents an unknown black-box function. These problem classes are a surprisingly good fit for modeling real-world problems such as competitive LLM training. To solve the problems from these classes, we propose a new reduction scheme to construct UCB-type algorithms, namely, the F-LCB algorithm, based on algorithms for nonlinear optimization with known convergence rates. We provide the regret upper bounds for this reduction scheme based on the base algorithms' convergence rates. We add numerical experiments that demonstrate the performance of the proposed scheme.
Optimal Best Arm Identification with Post-Action Context
Shahverdikondori, Mohammad, Abouei, Amir Mohammad, Rezaeimoghadam, Alireza, Kiyavash, Negar
We introduce the problem of best arm identification (BAI) with post-action context, a new BAI problem in a stochastic multi-armed bandit environment and the fixed-confidence setting. The problem addresses the scenarios in which the learner receives a $\textit{post-action context}$ in addition to the reward after playing each action. This post-action context provides additional information that can significantly facilitate the decision process. We analyze two different types of the post-action context: (i) $\textit{non-separator}$, where the reward depends on both the action and the context, and (ii) $\textit{separator}$, where the reward depends solely on the context. For both cases, we derive instance-dependent lower bounds on the sample complexity and propose algorithms that asymptotically achieve the optimal sample complexity. For the non-separator setting, we do so by demonstrating that the Track-and-Stop algorithm can be extended to this setting. For the separator setting, we propose a novel sampling rule called $\textit{G-tracking}$, which uses the geometry of the context space to directly track the contexts rather than the actions. Finally, our empirical results showcase the advantage of our approaches compared to the state of the art.
Physically-Consistent Parameter Identification of Robots in Contact
Khorshidi, Shahram, Dawood, Murad, Nederkorn, Benno, Bennewitz, Maren, Khadiv, Majid
Accurate inertial parameter identification is crucial for the simulation and control of robots encountering intermittent contact with the environment. Classically, robots' inertial parameters are obtained from CAD models that are not precise (and sometimes not available, e.g., Spot from Boston Dynamics), hence requiring identification. To do that, existing methods require access to contact force measurement, a modality not present in modern quadruped and humanoid robots. This paper presents an alternative technique that utilizes joint current/torque measurements -- a standard sensing modality in modern robots -- to identify inertial parameters without requiring direct contact force measurements. By projecting the whole-body dynamics into the null space of contact constraints, we eliminate the dependency on contact forces and reformulate the identification problem as a linear matrix inequality that can handle physical and geometrical constraints. We compare our proposed method against a common black-box identification mrethod using a deep neural network and show that incorporating physical consistency significantly improves the sample efficiency and generalizability of the model. Finally, we validate our method on the Spot quadruped robot across various locomotion tasks, showcasing its accuracy and generalizability in real-world scenarios over different gaits.
A Sequential Decision-Making Model for Perimeter Identification
Perimeter identification involves ascertaining the boundaries of a designated area or zone, requiring traffic flow monitoring, control, or optimization. Various methodologies and technologies exist for accurately defining these perimeters; however, they often necessitate specialized equipment, precise mapping, or comprehensive data for effective problem delineation. In this study, we propose a sequential decision-making framework for perimeter search, designed to operate efficiently in real-time and require only publicly accessible information. We conceptualize the perimeter search as a game between a playing agent and an artificial environment, where the agent's objective is to identify the optimal perimeter by sequentially improving the current perimeter. We detail the model for the game and discuss its adaptability in determining the definition of an optimal perimeter. Ultimately, we showcase the model's efficacy through a real-world scenario, highlighting the identification of corresponding optimal perimeters.
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Tang, Dengwang, Jain, Rahul, Nayyar, Ashutosh, Nuzzo, Pierluigi
In this paper, we introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. This is a pure exploration problem in a stochastic finite armed bandit model. Each arm is associated with a reward and multiple types of costs from unknown distributions. Unlike the unconstrained best arm identification problem, the optimal solution for the CBMAI problem may be a randomized mixture of multiple arms. The goal thus is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$. We propose a novel, parameter-free algorithm, called the Score Function-based Successive Reject (SFSR) algorithm, that combines the classical successive reject framework with a novel score-function-based rejection criteria based on linear programming theory to identify the optimal support. We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$ and some constants that characterize the hardness of the problem instance. We also develop an information theoretic lower bound on the error probability that shows that these constants appropriately characterize the problem difficulty. We validate this empirically on a number of average and hard instances.
Minimal Evidence Group Identification for Claim Verification
Li, Xiangci, Chen, Sihao, Kapadia, Rajvi, Ouyang, Jessica, Zhang, Fan
Claim verification in real-world settings (e.g. against a large collection of candidate evidences retrieved from the web) typically requires identifying and aggregating a complete set of evidence pieces that collectively provide full support to the claim. The problem becomes particularly challenging when there exists distinct sets of evidence that could be used to verify the claim from different perspectives. In this paper, we formally define and study the problem of identifying such minimal evidence groups (MEGs) for claim verification. We show that MEG identification can be reduced from Set Cover problem, based on entailment inference of whether a given evidence group provides full/partial support to a claim. Our proposed approach achieves 18.4% and 34.8% absolute improvements on the WiCE and SciFact datasets over LLM prompting. Finally, we demonstrate the benefits of MEGs in downstream applications such as claim generation.
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms.