AITopics | bregman

Collaborating Authors

bregman

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Foundations of Top-k Decoding for Language Models

Neural Information Processing SystemsJun-21-2026, 12:09:10 GMT

Top-kdecoding is a widely used method for sampling from LLMs: at each token, only the largest k next-token-probabilities are kept, and the next token is sampled after renormalizing them to sum to unity. Top-kand other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-k decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-k decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We introduce Bregman decoders obtained by minimizing a separable Bregman divergence (for both the primal and dual cases) with a sparsity-inducing ℓ0-regularization; in particular, these decoders are adaptive in the sense that the sparsity parameter k is chosen depending on the underlying token distribution. Despite the combinatorial nature of the sparse Bregman objective, we show how to optimize it efficiently for a large class of divergences. We prove that (i) the optimal decoding strategies are greedy, and further that (ii) the objective is discretely convex in k, such that the optimal k can be identified in logarithmic time. We note that standard top-k decoding arises as a special case for the KL divergence, and construct new decoding strategies with substantially different behaviors (e.g., non-linearly up-weighting larger probabilities after renormalization).

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Europe (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization

Zhao, Peng, Yan, Yu-Hu, Yu, Hang, Zhou, Zhi-Hua

arXiv.org Machine LearningNov-26-2025

Universal online learning aims to achieve optimal regret guarantees without requiring prior knowledge of the curvature of online functions. Existing methods have established minimax-optimal regret bounds for universal online learning, where a single algorithm can simultaneously attain $\mathcal{O}(\sqrt{T})$ regret for convex functions, $\mathcal{O}(d \log T)$ for exp-concave functions, and $\mathcal{O}(\log T)$ for strongly convex functions, where $T$ is the number of rounds and $d$ is the dimension of the feasible domain. However, these methods still lack problem-dependent adaptivity. In particular, no universal method provides regret bounds that scale with the gradient variation $V_T$, a key quantity that plays a crucial role in applications such as stochastic optimization and fast-rate convergence in games. In this work, we introduce UniGrad, a novel approach that achieves both universality and adaptivity, with two distinct realizations: UniGrad.Correct and UniGrad.Bregman. Both methods achieve universal regret guarantees that adapt to gradient variation, simultaneously attaining $\mathcal{O}(\log V_T)$ regret for strongly convex functions and $\mathcal{O}(d \log V_T)$ regret for exp-concave functions. For convex functions, the regret bounds differ: UniGrad.Correct achieves an $\mathcal{O}(\sqrt{V_T \log V_T})$ bound while preserving the RVU property that is crucial for fast convergence in online games, whereas UniGrad.Bregman achieves the optimal $\mathcal{O}(\sqrt{V_T})$ regret bound through a novel design. Both methods employ a meta algorithm with $\mathcal{O}(\log T)$ base learners, which naturally requires $\mathcal{O}(\log T)$ gradient queries per round. To enhance computational efficiency, we introduce UniGrad++, which retains the regret while reducing the gradient query to just $1$ per round via surrogate optimization. We further provide various implications.

algorithm, base learner, convex function, (12 more...)

arXiv.org Machine Learning

2511.19937

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Promising Solution (0.47)
Research Report > New Finding (0.46)

Industry:

Education (0.68)
Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Add feedback

Foundations of Top-$k$ Decoding For Language Models

Noarov, Georgy, Mallick, Soham, Wang, Tao, Joshi, Sunay, Sun, Yan, Xie, Yangxinyu, Yu, Mengxin, Dobriban, Edgar

arXiv.org Artificial IntelligenceMay-27-2025

Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in $k$, so that binary search provably and efficiently finds the optimal $k$. We show that top-$k$ decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization).

bregman, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2505.19371

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

pyBregMan: A Python library for Bregman Manifolds

Nielsen, Frank, Soen, Alexander

arXiv.org Artificial IntelligenceAug-7-2024

A Bregman manifold is a synonym for a dually flat space in information geometry which admits as a canonical divergence a Bregman divergence. Bregman manifolds are induced by smooth strictly convex functions like the cumulant or partition functions of regular exponential families, the negative entropy of mixture families, or the characteristic functions of regular cones just to list a few such convex Bregman generators. We describe the design of pyBregMan, a library which implements generic operations on Bregman manifolds and instantiate several common Bregman manifolds used in information sciences. At the core of the library is the notion of Legendre-Fenchel duality inducing a canonical pair of dual potential functions and dual Bregman divergences. The library also implements the Fisher-Rao manifolds of categorical/multinomial distributions and multivariate normal distributions. To demonstrate the use of the pyBregMan kernel manipulating those Bregman and Fisher-Rao manifolds, the library also provides several core algorithms for various applications in statistics, machine learning, information fusion, and so on.

artificial intelligence, bregman, manifold, (14 more...)

arXiv.org Artificial Intelligence

2408.04175

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Compositional Generalization without Trees using Multiset Tagging and Latent Permutations

Lindemann, Matthias, Koller, Alexander, Titov, Ivan

arXiv.org Artificial IntelligenceMay-26-2023

Seq2seq models have been shown to struggle with compositional generalization in semantic parsing, i.e. generalizing to unseen compositions of phenomena that the model handles correctly in isolation. We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens. Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations. We formulate predicting a permutation as solving a regularized linear program and we backpropagate through the solver. In contrast to prior work, our approach does not place a priori restrictions on possible permutations, making it very expressive. Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks that require generalization to longer examples. We also outperform non-tree-based models on structural generalization on the COGS benchmark. For the first time, we show that a model without an inductive bias provided by trees achieves high accuracy on generalization to deeper recursion.

artificial intelligence, natural language, permutation, (18 more...)

arXiv.org Artificial Intelligence

2305.16954

Country:

North America > United States > Washington > King County > Seattle (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(11 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

A Differentiable Relaxation of Graph Segmentation and Alignment for AMR Parsing

Lyu, Chunchuan, Cohen, Shay B., Titov, Ivan

arXiv.org Artificial IntelligenceOct-24-2022

Abstract Meaning Representations (AMR) are a broad-coverage semantic formalism which represents sentence meaning as a directed acyclic graph. To train most AMR parsers, one needs to segment the graph into subgraphs and align each such subgraph to a word in a sentence; this is normally done at preprocessing, relying on hand-crafted rules. In contrast, we treat both alignment and segmentation as latent variables in our model and induce them as part of end-to-end training. As marginalizing over the structured latent variables is infeasible, we use the variational autoencoding framework. To ensure end-to-end differentiable optimization, we introduce a differentiable relaxation of the segmentation and alignment problems. We observe that inducing segmentation yields substantial gains over using a `greedy' segmentation heuristic. The performance of our method also approaches that of a model that relies on the segmentation rules of \citet{lyu-titov-2018-amr}, which were hand-crafted to handle individual AMR constructions.

machine learning, natural language, node, (17 more...)

arXiv.org Artificial Intelligence

2010.12676

Country:

Asia > Japan > Kyūshū & Okinawa > Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(8 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.67)

Add feedback

Universal Basic Income: Why Elon Musk Thinks It May Be The Future

International Business TimesJan-4-2018, 18:10:50 GMT

Universal basic income (UBI), an unconditional allowance afforded to all citizens for the bare essentials of life, is an old idea that's garnered support from members of both the left and right. Notable supporters have been as disparate as civil rights activist Martin Luther King, Jr. and libertarian economist Milton Friedman. The Nixon Administration even attempted to pass a basic income guarantee through Congress and failed only narrowly due to a disagreement as to how much the stipend should be. Now, the debate over universal basic income is being renewed by industry leaders and billionaires who include Mark Zuckerberg, Richard Branson and Elon Musk, among others. As automation approaches, the world is faced with the problem of displacement.

artificial intelligence, machine learning, universal basic income, (8 more...)

International Business Times

Country: North America > United States (0.73)

Industry:

Law > Civil Rights & Constitutional Law (0.56)
Government > Regional Government > North America Government > United States Government (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

Basic income makes more sense than ever in the Trump era

#artificialintelligenceDec-12-2016, 15:10:15 GMT

By that time, economists predict robotics and artificial intelligence will have begun their unstoppable march into American factories. People will start losing their jobs en masse, and it'll be up to President Trump and his cabinet to devise an economic escape plan. A growing band of advocates argue it doesn't have to turn out this way if Trump embraces a radical form of wealth distribution known as basic income. Trump's preparations (or lack thereof) for a future of robotic automation will begin with how he addresses the current concerns of middle America, where manufacturing jobs are already dying and people are desperate for change. Plenty of people just want work that generates enough money to keep their family safe and healthy.

artificial intelligence, basic income, trump, (10 more...)

#artificialintelligence

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Tax (0.72)
Banking & Finance > Economy (0.72)

Technology: Information Technology > Artificial Intelligence > Robots (0.59)

Add feedback

Synchronized Auditory and Cognitive 40 Hz Attentional Streams, and the Impact of Rhythmic Expectation on Auditory Scene Analysis

Baird, Bill

Neural Information Processing SystemsDec-31-1998

We have developed a neural network architecture that implements a theory of attention, learning, and trans-cortical communication based on adaptive synchronization of 5-15 Hz and 30-80 Hz oscillations between cortical areas.

experiment, frequency, target tone, (14 more...)

Neural Information Processing Systems

Country: