multinomial
- North America > United States > District of Columbia > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Online (Multinomial) Logistic Bandit: Improved Regret and Constant Computation Cost
This paper investigates the logistic bandit problem, a variant of the generalized linear bandit model that utilizes a logistic model to depict the feedback from an action. While most existing research focuses on the binary logistic bandit problem, the multinomial case, which considers more than two possible feedback values, offers increased practical relevance and adaptability for use in complex decision-making problems such as reinforcement learning. In this paper, we provide an algorithm that enjoys both statistical and computational efficiency for the logistic bandit problem. In the binary case, our method improves the state-of-the-art binary logistic bandit method by reducing the per-round computation cost from $\mathcal{O}(\log T)$ to $\mathcal{O}(1)$ with respect to the time horizon $T$, while still preserving the minimax optimal guarantee up to logarithmic factors. In the multinomial case, with $K+1$ potential feedback values, our algorithm achieves an $\tilde{\mathcal{O}}(K\sqrt{T})$ regret bound with $\mathcal{O}(1)$ computational cost per round. The result not only improves the $\tilde{\mathcal{O}}(K\sqrt{\kappa T})$ bound for the best-known tractable algorithm--where the large constant $\kappa$ increases exponentially with the diameter of the parameter domain--but also reduces the $\mathcal{O}(T)$ computational complexity demanded by the previous method.
Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
Fadeeva, Ekaterina, Goloburda, Maiya, Rubashevskii, Aleksandr, Vashurin, Roman, Shelmanov, Artem, Nakov, Preslav, Sachan, Mrinmaya, Panov, Maxim
Consistency-based methods have emerged as an effective approach to uncertainty quantification (UQ) in large language models. These methods typically rely on several generations obtained via multinomial sampling, measuring their agreement level. However, in short-form QA, multinomial sampling is prone to producing duplicates due to peaked distributions, and its stochasticity introduces considerable variance in uncertainty estimates across runs. We introduce a new family of methods that employ beam search to generate candidates for consistency-based UQ, yielding improved performance and reduced variance compared to multinomial sampling. We also provide a theoretical lower bound on the beam set probability mass under which beam search achieves a smaller error than multinomial sampling. We empirically evaluate our approach on six QA datasets and find that its consistent improvements over multinomial sampling lead to state-of-the-art UQ performance.
- Europe > Austria > Vienna (0.14)
- Europe > Middle East > Cyprus (0.04)
- South America > Suriname > Marowijne District > Albina (0.04)
- (3 more...)
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Fixed-Length Poisson MRF: Adding Dependencies to the Multinomial
We propose a novel distribution that generalizes the Multinomial distribution to enable dependencies between dimensions. Our novel distribution is based on the parametric form of the Poisson MRF model [Yang et al., 2012] but is fundamentally different because of the domain restriction to a fixed-length vector like in a Multinomial where the number of trials is fixed or known. Thus, we propose the Fixed-Length Poisson MRF (LPMRF) distribution. We develop methods to estimate the likelihood and log partition function (i.e. the log normalizing constant), which was not developed for the Poisson MRF model. In addition, we propose novel mixture and topic models that use LPMRF as a base distribution and discuss the similarities and differences with previous topic models such as the recently proposed Admixture of Poisson MRFs [Inouye et al., 2014]. We show the effectiveness of our LPMRF distribution over Multinomial models by evaluating the test set perplexity on a dataset of abstracts and Wikipedia. Qualitatively, we show that the positive dependencies discovered by LPMRF are interesting and intuitive. Finally, we show that our algorithms are fast and have good scaling (code available online).
Achieving Limited Adaptivity for Multinomial Logistic Bandits
Midigeshi, Sukruta Prakash, Goyal, Tanmay, Sinha, Gaurav
Multinomial Logistic Bandits have recently attracted much attention due to their ability to model problems with multiple outcomes. In this setting, each decision is associated with many possible outcomes, modeled using a multinomial logit function. Several recent works on multinomial logistic bandits have simultaneously achieved optimal regret and computational efficiency. However, motivated by real-world challenges and practicality, there is a need to develop algorithms with limited adaptivity, wherein we are allowed only $M$ policy updates. To address these challenges, we present two algorithms, B-MNL-CB and RS-MNL, that operate in the batched and rarely-switching paradigms, respectively. The batched setting involves choosing the $M$ policy update rounds at the start of the algorithm, while the rarely-switching setting can choose these $M$ policy update rounds in an adaptive fashion. Our first algorithm, B-MNL-CB extends the notion of distributional optimal designs to the multinomial setting and achieves $\tilde{O}(\sqrt{T})$ regret assuming the contexts are generated stochastically when presented with $Ω(\log \log T)$ update rounds. Our second algorithm, RS-MNL works with adversarially generated contexts and can achieve $\tilde{O}(\sqrt{T})$ regret with $\tilde{O}(\log T)$ policy updates. Further, we conducted experiments that demonstrate that our algorithms (with a fixed number of policy updates) are extremely competitive (and often better) than several state-of-the-art baselines (which update their policy every round), showcasing the applicability of our algorithms in various practical scenarios.
- Asia > India (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (5 more...)
BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks
The rapid growth of Large Language Models (LLMs) raises concerns about distinguishing AI-generated text from human content. Existing watermarking techniques, like \kgw, struggle with low watermark strength and stringent false-positive requirements. Our analysis reveals that current methods rely on coarse estimates of non-watermarked text, limiting watermark detectability. To address this, we propose Bipolar Watermark (\tool), which splits generated text into positive and negative poles, enhancing detection without requiring additional computational resources or knowledge of the prompt. Theoretical analysis and experimental results demonstrate \tool's effectiveness and compatibility with existing optimization techniques, providing a new optimization dimension for watermarking in LLM-generated content.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Online (Multinomial) Logistic Bandit: Improved Regret and Constant Computation Cost
This paper investigates the logistic bandit problem, a variant of the generalized linear bandit model that utilizes a logistic model to depict the feedback from an action. While most existing research focuses on the binary logistic bandit problem, the multinomial case, which considers more than two possible feedback values, offers increased practical relevance and adaptability for use in complex decision-making problems such as reinforcement learning. In this paper, we provide an algorithm that enjoys both statistical and computational efficiency for the logistic bandit problem. In the binary case, our method improves the state-of-the-art binary logistic bandit method by reducing the per-round computation cost from \mathcal{O}(\log T) to \mathcal{O}(1) with respect to the time horizon T, while still preserving the minimax optimal guarantee up to logarithmic factors. In the multinomial case, with K 1 potential feedback values, our algorithm achieves an \tilde{\mathcal{O}}(K\sqrt{T}) regret bound with \mathcal{O}(1) computational cost per round.