AITopics | mixability

Collaborating Authors

mixability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mixability made efficient: Fast online multiclass logistic regression

Neural Information Processing SystemsApr-27-2026, 00:28:39 GMT

Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achieves a regret of O(log(Bn)) whereas Online Newton Step achieves O(eBlog(n)) obtaining a double exponential gain in B (a bound on the norm of comparative functions). However, this high statistical performance is at the price of a prohibitive computational complexity O(n37). In this paper, we use quadratic surrogates to make aggregating forecasters more efficient. We show that the resulting algorithm has still high statistical performance for a large class of losses. In particular, we derive an algorithm for multi-class logistic regression with a regret bounded by O(Blog(n)) and a computational complexity of only O(n4).

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > France (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

Mixability made efficient: Fast online multiclass logistic regression

Neural Information Processing SystemsDec-24-2025, 21:34:23 GMT

Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (see Foster et al. 2018) achieves a regret of $O(\log(Bn))$ whereas Online Newton Step achieves $O(e^B\log(n))$ obtaining a double exponential gain in $B$ (a bound on the norm of comparative functions). However, this high statistical performance is at the price of a prohibitive computational complexity $O(n^{37})$.In this paper, we use quadratic surrogates to make aggregating forecasters more efficient. We show that the resulting algorithm has still high statistical performance for a large class of losses. In particular, we derive an algorithm for multiclass regression with a regret bounded by $O(B\log(n))$ and computational complexity of only $O(n^4)$.

fast online multiclass logistic regression, mixability, multiclass logistic regression, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Constant Regret, Generalized Mixability, and Mirror Descent

Zakaria Mhammedi, Robert C. Williamson

Neural Information Processing SystemsNov-20-2025, 19:17:46 GMT

Under this setting, and for the right choice of loss function and "mixing" algorithm, it is possible for the learner to achieve a constant regret regardless of the number of prediction rounds.

artificial intelligence, loss null, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

From Stochastic Mixability to Fast Rates

Neural Information Processing SystemsMay-27-2025, 22:13:12 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution \mathsf{P} and returns a hypothesis f chosen from a fixed class \mathcal{F} with small loss \ell . In the parametric setting, depending upon (\ell, \mathcal{F},\mathsf{P}) ERM can have slow (1/\sqrt{n}) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n . There exist several results that give sufficient conditions for fast rates in terms of joint properties of \ell, \mathcal{F}, and \mathsf{P}, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss \ell (there being no role there for \mathcal{F} or \mathsf{P}). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case.

mathcal, mixability, stochastic mixability, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Fast learning rates with heavy-tailed losses

Neural Information Processing SystemsJan-20-2025, 12:06:15 GMT

This paper provides some new results in an important area which is receiving more and more attention: fast rates when loss functions are unbounded and heavy-tailed. Existing results based on empirical process theory often rely on bounded or sub-Gaussian loss, and the heavy tails (hence non-sub-Gaussian) case is considerably harder. The results presented seem sound and are definitely novel. They rely on results of Sara van de Geer and collaborators on concentration inequalities for unbounded empirical processes. The material is very technical and I would suggest moving even some more material to the appendix.

audibert, bernstein condition, heavy-tailed loss, (9 more...)

Neural Information Processing Systems

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Mixability made efficient: Fast online multiclass logistic regression

Neural Information Processing SystemsJan-19-2025, 03:25:08 GMT

Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (see Foster et al. 2018) achieves a regret of O(\log(Bn)) whereas Online Newton Step achieves O(e B\log(n)) obtaining a double exponential gain in B (a bound on the norm of comparative functions). However, this high statistical performance is at the price of a prohibitive computational complexity O(n {37}) .In this paper, we use quadratic surrogates to make aggregating forecasters more efficient. We show that the resulting algorithm has still high statistical performance for a large class of losses. In particular, we derive an algorithm for multiclass regression with a regret bounded by O(B\log(n)) and computational complexity of only O(n 4) .

fast online multiclass logistic regression, mixability, multiclass logistic regression, (4 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Mixability in Statistical Learning

Neural Information Processing SystemsMar-14-2024, 09:49:41 GMT

Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequential prediction, the mixability of a loss, has a natural counterpart in the statistical setting, which we call stochastic mixability. Just as ordinary mixability characterizes fast rates for the worst-case regret in sequential prediction, stochastic mixability characterizes fast rates in statistical learning. We show that, in the special case of log-loss, stochastic mixability reduces to a well-known (but usually unnamed) martingale condition, which is used in existing convergence theorems for minimum description length and Bayesian inference. In the case of 0/1-loss, it reduces to the margin condition of Mammen and Tsybakov, and in the case that the model under consideration contains all possible predictors, it is equivalent to ordinary mixability.

mixability, prediction, stochastic mixability, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

The Geometry of Mixability

Pacheco, Armando J. Cabrera, Williamson, Robert C.

arXiv.org Artificial IntelligenceFeb-23-2023

Mixable loss functions are of fundamental importance in the context of prediction with expert advice in the online setting since they characterize fast learning rates. By re-interpreting properness from the point of view of differential geometry, we provide a simple geometric characterization of mixability for the binary and multi-class cases: a proper loss function $\ell$ is $\eta$-mixable if and only if the superpredition set $\textrm{spr}(\eta \ell)$ of the scaled loss function $\eta \ell$ slides freely inside the superprediction set $\textrm{spr}(\ell_{\log})$ of the log loss $\ell_{\log}$, under fairly general assumptions on the differentiability of $\ell$. Our approach provides a way to treat some concepts concerning loss functions (like properness) in a ''coordinate-free'' manner and reconciles previous results obtained for mixable loss functions for the binary and the multi-class cases.

artificial intelligence, loss function, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2302.11905

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Pennsylvania (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

From Stochastic Mixability to Fast Rates

Mehta, Nishant A., Williamson, Robert C.

Neural Information Processing SystemsFeb-14-2020, 07:26:29 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution $\mathsf{P}$ and returns a hypothesis $f$ chosen from a fixed class $\mathcal{F}$ with small loss $\ell$. In the parametric setting, depending upon $(\ell, \mathcal{F},\mathsf{P})$ ERM can have slow $(1/\sqrt{n})$ or fast $(1/n)$ rates of convergence of the excess risk as a function of the sample size $n$. There exist several results that give sufficient conditions for fast rates in terms of joint properties of $\ell$, $\mathcal{F}$, and $\mathsf{P}$, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss $\ell$ (there being no role there for $\mathcal{F}$ or $\mathsf{P}$). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case.

mathcal, mixability, stochastic mixability, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Integral Mixabilty: a Tool for Efficient Online Aggregation of Functional and Probabilistic Forecasts

Korotin, Alexander, V'yugin, Vladimir, Burnaev, Evgeny

arXiv.org Machine LearningDec-15-2019

In this paper we extend the setting of the online prediction with expert advice to function-valued forecasts. At each step of the online game several experts predict a function and the learner has to efficiently aggregate these functional forecasts into one a single forecast. We adapt basic mixable loss functions to compare functional predictions and prove that these "integral" expansions are also mixable. We call this phenomena integral mixability. As an application, we consider various loss functions for prediction of probability distributions and show that they are mixable by using our main result. The considered loss functions include Continuous ranking probability score (CRPS), Optimal transport costs (OT), Beta-2 and Kullback-Leibler (KL) divergences.

artificial intelligence, machine learning, substitution function, (15 more...)

arXiv.org Machine Learning

1912.07048

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback