bregman
Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization
Zhao, Peng, Yan, Yu-Hu, Yu, Hang, Zhou, Zhi-Hua
Universal online learning aims to achieve optimal regret guarantees without requiring prior knowledge of the curvature of online functions. Existing methods have established minimax-optimal regret bounds for universal online learning, where a single algorithm can simultaneously attain $\mathcal{O}(\sqrt{T})$ regret for convex functions, $\mathcal{O}(d \log T)$ for exp-concave functions, and $\mathcal{O}(\log T)$ for strongly convex functions, where $T$ is the number of rounds and $d$ is the dimension of the feasible domain. However, these methods still lack problem-dependent adaptivity. In particular, no universal method provides regret bounds that scale with the gradient variation $V_T$, a key quantity that plays a crucial role in applications such as stochastic optimization and fast-rate convergence in games. In this work, we introduce UniGrad, a novel approach that achieves both universality and adaptivity, with two distinct realizations: UniGrad.Correct and UniGrad.Bregman. Both methods achieve universal regret guarantees that adapt to gradient variation, simultaneously attaining $\mathcal{O}(\log V_T)$ regret for strongly convex functions and $\mathcal{O}(d \log V_T)$ regret for exp-concave functions. For convex functions, the regret bounds differ: UniGrad.Correct achieves an $\mathcal{O}(\sqrt{V_T \log V_T})$ bound while preserving the RVU property that is crucial for fast convergence in online games, whereas UniGrad.Bregman achieves the optimal $\mathcal{O}(\sqrt{V_T})$ regret bound through a novel design. Both methods employ a meta algorithm with $\mathcal{O}(\log T)$ base learners, which naturally requires $\mathcal{O}(\log T)$ gradient queries per round. To enhance computational efficiency, we introduce UniGrad++, which retains the regret while reducing the gradient query to just $1$ per round via surrogate optimization. We further provide various implications.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Promising Solution (0.47)
- Research Report > New Finding (0.46)
- Education (0.68)
- Leisure & Entertainment > Games (0.48)
Foundations of Top-$k$ Decoding For Language Models
Noarov, Georgy, Mallick, Soham, Wang, Tao, Joshi, Sunay, Sun, Yan, Xie, Yangxinyu, Yu, Mengxin, Dobriban, Edgar
Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in $k$, so that binary search provably and efficiently finds the optimal $k$. We show that top-$k$ decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization).
- North America > United States > Pennsylvania (0.04)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
pyBregMan: A Python library for Bregman Manifolds
Nielsen, Frank, Soen, Alexander
A Bregman manifold is a synonym for a dually flat space in information geometry which admits as a canonical divergence a Bregman divergence. Bregman manifolds are induced by smooth strictly convex functions like the cumulant or partition functions of regular exponential families, the negative entropy of mixture families, or the characteristic functions of regular cones just to list a few such convex Bregman generators. We describe the design of pyBregMan, a library which implements generic operations on Bregman manifolds and instantiate several common Bregman manifolds used in information sciences. At the core of the library is the notion of Legendre-Fenchel duality inducing a canonical pair of dual potential functions and dual Bregman divergences. The library also implements the Fisher-Rao manifolds of categorical/multinomial distributions and multivariate normal distributions. To demonstrate the use of the pyBregMan kernel manipulating those Bregman and Fisher-Rao manifolds, the library also provides several core algorithms for various applications in statistics, machine learning, information fusion, and so on.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- (3 more...)
Compositional Generalization without Trees using Multiset Tagging and Latent Permutations
Lindemann, Matthias, Koller, Alexander, Titov, Ivan
Seq2seq models have been shown to struggle with compositional generalization in semantic parsing, i.e. generalizing to unseen compositions of phenomena that the model handles correctly in isolation. We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens. Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations. We formulate predicting a permutation as solving a regularized linear program and we backpropagate through the solver. In contrast to prior work, our approach does not place a priori restrictions on possible permutations, making it very expressive. Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks that require generalization to longer examples. We also outperform non-tree-based models on structural generalization on the COGS benchmark. For the first time, we show that a model without an inductive bias provided by trees achieves high accuracy on generalization to deeper recursion.
- North America > United States > Washington > King County > Seattle (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (11 more...)
A Differentiable Relaxation of Graph Segmentation and Alignment for AMR Parsing
Lyu, Chunchuan, Cohen, Shay B., Titov, Ivan
Abstract Meaning Representations (AMR) are a broad-coverage semantic formalism which represents sentence meaning as a directed acyclic graph. To train most AMR parsers, one needs to segment the graph into subgraphs and align each such subgraph to a word in a sentence; this is normally done at preprocessing, relying on hand-crafted rules. In contrast, we treat both alignment and segmentation as latent variables in our model and induce them as part of end-to-end training. As marginalizing over the structured latent variables is infeasible, we use the variational autoencoding framework. To ensure end-to-end differentiable optimization, we introduce a differentiable relaxation of the segmentation and alignment problems. We observe that inducing segmentation yields substantial gains over using a `greedy' segmentation heuristic. The performance of our method also approaches that of a model that relies on the segmentation rules of \citet{lyu-titov-2018-amr}, which were hand-crafted to handle individual AMR constructions.
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (8 more...)
Universal Basic Income: Why Elon Musk Thinks It May Be The Future
Universal basic income (UBI), an unconditional allowance afforded to all citizens for the bare essentials of life, is an old idea that's garnered support from members of both the left and right. Notable supporters have been as disparate as civil rights activist Martin Luther King, Jr. and libertarian economist Milton Friedman. The Nixon Administration even attempted to pass a basic income guarantee through Congress and failed only narrowly due to a disagreement as to how much the stipend should be. Now, the debate over universal basic income is being renewed by industry leaders and billionaires who include Mark Zuckerberg, Richard Branson and Elon Musk, among others. As automation approaches, the world is faced with the problem of displacement.
- North America > United States (0.73)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)
Basic income makes more sense than ever in the Trump era
By that time, economists predict robotics and artificial intelligence will have begun their unstoppable march into American factories. People will start losing their jobs en masse, and it'll be up to President Trump and his cabinet to devise an economic escape plan. A growing band of advocates argue it doesn't have to turn out this way if Trump embraces a radical form of wealth distribution known as basic income. Trump's preparations (or lack thereof) for a future of robotic automation will begin with how he addresses the current concerns of middle America, where manufacturing jobs are already dying and people are desperate for change. Plenty of people just want work that generates enough money to keep their family safe and healthy.
- North America > United States (1.00)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Tax (0.72)
- Banking & Finance > Economy (0.72)
- Europe > Norway > Norwegian Sea (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Norway > Norwegian Sea (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)