AITopics | stochastic mixability

We also show that when stochastic mixability does not hold in a certain sense (described in Section 5), then the risk minimizer is not unique in a bad way.

artificial intelligence, machine learning, stochastic mixability, (16 more...)

Neural Information Processing Systems

Country: Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

From Stochastic Mixability to Fast Rates

Neural Information Processing SystemsDec-27-2025, 14:39:59 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution $\mathsf{P}$ and returns a hypothesis $f$ chosen from a fixed class $\mathcal{F}$ with small loss $\ell$. In the parametric setting, depending upon $(\ell, \mathcal{F},\mathsf{P})$ ERM can have slow $(1/\sqrt{n})$ or fast $(1/n)$ rates of convergence of the excess risk as a function of the sample size $n$. There exist several results that give sufficient conditions for fast rates in terms of joint properties of $\ell$, $\mathcal{F}$, and $\mathsf{P}$, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss $\ell$ (there being no role there for $\mathcal{F}$ or $\mathsf{P}$). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of $(\ell,\mathcal{F}, \mathsf{P})$, and in so doing provides new insight into the fast-rates phenomenon.

mathcal, name change, stochastic mixability, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

From Stochastic Mixability to Fast Rates

Nishant A. Mehta, Robert C. Williamson

Neural Information Processing SystemsOct-2-2025, 17:03:44 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss null. In the parametric setting, depending upon (null, F, P) ERM can have slow (1 / n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n . There exist several results that give sufficient conditions for fast rates in terms of joint properties of null, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss null (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of ( null, F, P), and in so doing provides new insight into the fast-rates phenomenon.

bernstein condition, fast rate, stochastic mixability, (15 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

From Stochastic Mixability to Fast Rates

Neural Information Processing SystemsMay-27-2025, 22:13:12 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution \mathsf{P} and returns a hypothesis f chosen from a fixed class \mathcal{F} with small loss \ell . In the parametric setting, depending upon (\ell, \mathcal{F},\mathsf{P}) ERM can have slow (1/\sqrt{n}) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n . There exist several results that give sufficient conditions for fast rates in terms of joint properties of \ell, \mathcal{F}, and \mathsf{P}, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss \ell (there being no role there for \mathcal{F} or \mathsf{P}). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case.

mathcal, mixability, stochastic mixability, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

From Stochastic Mixability to Fast Rates

Nishant A. Mehta, Robert C. Williamson

Neural Information Processing SystemsFeb-9-2025, 09:36:43 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss l. In the parametric setting, depending upon (l, F, P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of l, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss l (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (l, F, P), and in so doing provides new insight into the fast-rates phenomenon.

artificial intelligence, machine learning, stochastic mixability, (17 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

Mixability in Statistical Learning

Neural Information Processing SystemsMar-14-2024, 09:49:41 GMT

Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequential prediction, the mixability of a loss, has a natural counterpart in the statistical setting, which we call stochastic mixability. Just as ordinary mixability characterizes fast rates for the worst-case regret in sequential prediction, stochastic mixability characterizes fast rates in statistical learning. We show that, in the special case of log-loss, stochastic mixability reduces to a well-known (but usually unnamed) martingale condition, which is used in existing convergence theorems for minimum description length and Bayesian inference. In the case of 0/1-loss, it reduces to the margin condition of Mammen and Tsybakov, and in the case that the model under consideration contains all possible predictors, it is equivalent to ordinary mixability.

mixability, prediction, stochastic mixability, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

From Stochastic Mixability to Fast Rates

Neural Information Processing SystemsMar-13-2024, 09:57:24 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss l. In the parametric setting, depending upon (l, F, P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of l, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss l (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (l, F, P), and in so doing provides new insight into the fast-rates phenomenon.

bernstein condition, fast rate, stochastic mixability, (15 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

From Stochastic Mixability to Fast Rates

Mehta, Nishant A., Williamson, Robert C.

Neural Information Processing SystemsFeb-14-2020, 07:26:29 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution $\mathsf{P}$ and returns a hypothesis $f$ chosen from a fixed class $\mathcal{F}$ with small loss $\ell$. In the parametric setting, depending upon $(\ell, \mathcal{F},\mathsf{P})$ ERM can have slow $(1/\sqrt{n})$ or fast $(1/n)$ rates of convergence of the excess risk as a function of the sample size $n$. There exist several results that give sufficient conditions for fast rates in terms of joint properties of $\ell$, $\mathcal{F}$, and $\mathsf{P}$, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss $\ell$ (there being no role there for $\mathcal{F}$ or $\mathsf{P}$). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case.

mathcal, mixability, stochastic mixability, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fast rates in statistical and online learning

van Erven, Tim, Grünwald, Peter D., Mehta, Nishant A., Reid, Mark D., Williamson, Robert C.

arXiv.org Machine LearningSep-1-2015

The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.

central condition, fast rate, mixability, (14 more...)

arXiv.org Machine Learning

1507.02592

Country:

North America > United States > New York (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(7 more...)

Genre: Research Report (0.63)

Industry: Education > Educational Setting > Online (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

From Stochastic Mixability to Fast Rates

Mehta, Nishant A., Williamson, Robert C.

Neural Information Processing SystemsDec-31-2014

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution $\mathsf{P}$ and returns a hypothesis $f$ chosen from a fixed class $\mathcal{F}$ with small loss $\ell$. In the parametric setting, depending upon $(\ell, \mathcal{F},\mathsf{P})$ ERM can have slow $(1/\sqrt{n})$ or fast $(1/n)$ rates of convergence of the excess risk as a function of the sample size $n$. There exist several results that give sufficient conditions for fast rates in terms of joint properties of $\ell$, $\mathcal{F}$, and $\mathsf{P}$, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss $\ell$ (there being no role there for $\mathcal{F}$ or $\mathsf{P}$). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of $(\ell,\mathcal{F}, \mathsf{P})$, and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the general moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.

artificial intelligence, machine learning, stochastic mixability, (16 more...)

Neural Information Processing Systems

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

Filters

Collaborating Authors

stochastic mixability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

From Stochastic Mixability to Fast Rates

From Stochastic Mixability to Fast Rates

From Stochastic Mixability to Fast Rates

From Stochastic Mixability to Fast Rates

From Stochastic Mixability to Fast Rates

Mixability in Statistical Learning

From Stochastic Mixability to Fast Rates

From Stochastic Mixability to Fast Rates

Fast rates in statistical and online learning

From Stochastic Mixability to Fast Rates