AITopics | absolute discounting

Utilizing the structure of a probabilistic model can significantly increase its learning speed. Motivated by several recent applications, in particular bigram models in language processing, we consider learning low-rank conditional probability matrices under expected KL-risk. This choice makes smoothing, that is the careful handling of low-probability elements, paramount. We derive an iterative algorithm that extends classical non-negative matrix factorization to naturally incorporate additive smoothing and prove that it converges to the stationary points of a penalized empirical risk. We then derive sample-complexity bounds for the global minimzer of the penalized risk and show that it is within a small factor of the optimal sample complexity.

Add feedback

The power of absolute discounting: all-dimensional distribution estimation

Neural Information Processing SystemsMar-17-2026, 13:30:57 GMT

Categorical models are a natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions. We show that (1) absolute discounting recovers classical minimax KL-risk rates, (2) it is \emph{adaptive} to an effective dimension rather than the true dimension, (3) it is strongly related to the Good-Turing estimator and inherits its \emph{competitive} properties. We use power-law distributions as the cornerstone of these results.

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.61)

Add feedback

The power of absolute discounting: all-dimensional distribution estimation

Neural Information Processing SystemsNov-21-2025, 14:48:03 GMT

Categorical models are a natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions. We show that (1) absolute discounting recovers classical minimax KL-risk rates, (2) it is \emph{adaptive} to an effective dimension rather than the true dimension, (3) it is strongly related to the Good-Turing estimator and inherits its \emph{competitive} properties. We use power-law distributions as the cornerstone of these results.

absolute discounting, all-dimensional distribution estimation, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.61)

Add feedback

The power of absolute discounting: all-dimensional distribution estimation

Moein Falahatgar, Mesrob I. Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati

Neural Information Processing SystemsNov-21-2025, 06:52:09 GMT

Neural Information Processing Systems http://nips.cc/

absolute discounting, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > Maryland (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: The power of absolute discounting: all-dimensional distribution estimation

Neural Information Processing SystemsOct-7-2024, 17:31:54 GMT

This paper presents a theoretical examination of the optimality of absolute discounting similar to the examination of optimality of Good Turing in Orlitsky and Suresh (2015). Results for minimax optimality, adaptivity and competitiveness are presented, as well as an equivalence between absolute discounting and Good Turing in certain scenarios, which suggests a choice of discount. Experimental results demonstrate the quality of the approach, along with some interesting results on predicting terror attacks in classes of cities (e.g., cities with zero prior attacks) given prior data. The paper is very well written and crystal clear, and the results are quite interesting. This is an excellent addition to the literature on these methods.

absolute discounting, all-dimensional distribution estimation, good turing, (11 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.38)

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback

The power of absolute discounting: all-dimensional distribution estimation

Moein Falahatgar, Mesrob I. Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati

Neural Information Processing SystemsOct-3-2024, 03:08:45 GMT

Neural Information Processing Systems http://nips.cc/

absolute discounting, discounting, estimator, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > Maryland (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The power of absolute discounting: all-dimensional distribution estimation

Falahatgar, Moein, Ohannessian, Mesrob I., Orlitsky, Alon, Pichapati, Venkatadheeraj

Neural Information Processing SystemsFeb-14-2020, 19:12:58 GMT

Categorical models are a natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions.

absolute discounting, all-dimensional distribution estimation, estimator, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.65)

Add feedback

The power of absolute discounting: all-dimensional distribution estimation

Falahatgar, Moein, Ohannessian, Mesrob I., Orlitsky, Alon, Pichapati, Venkatadheeraj

Neural Information Processing SystemsDec-31-2017

Categorical models are a natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions. We show that (1) absolute discounting recovers classical minimax KL-risk rates, (2) it is \emph{adaptive} to an effective dimension rather than the true dimension, (3) it is strongly related to the Good-Turing estimator and inherits its \emph{competitive} properties. We use power-law distributions as the cornerstone of these results. We validate the theory via synthetic data and an application to the Global Terrorism Database.

absolute discounting, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Law Enforcement & Public Safety > Terrorism (0.34)

Technology: