AITopics | Pichapati, Venkatadheeraj

The power of absolute discounting: all-dimensional distribution estimation

Falahatgar, Moein, Ohannessian, Mesrob I., Orlitsky, Alon, Pichapati, Venkatadheeraj

Neural Information Processing SystemsFeb-14-2020, 19:12:58 GMT

Categorical models are a natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling.

absolute discounting, all-dimensional distribution estimation, artificial intelligence, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.65)

Add feedback

AdaCliP: Adaptive Clipping for Private SGD

Pichapati, Venkatadheeraj, Suresh, Ananda Theertha, Yu, Felix X., Reddi, Sashank J., Kumar, Sanjiv

arXiv.org Machine LearningAug-20-2019

Privacy preserving machine learning algorithms are crucial for learning models over user data to protect sensitive information. Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed. At each step, these algorithms modify the gradients and add noise proportional to the sensitivity of the modified gradients. Under this framework, we propose AdaCliP, a theoretically motivated differentially private SGD algorithm that provably adds less noise compared to the previous methods, by using coordinate-wise adaptive clipping of the gradient. We empirically demonstrate that AdaCliP reduces the amount of added noise and produces models with better accuracy.

deep learning, neural network, null 2, (17 more...)

arXiv.org Machine Learning

1908.07643

Country:

Asia (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Learning Markov Chains

HAO, Yi, Orlitsky, Alon, Pichapati, Venkatadheeraj

Neural Information Processing SystemsDec-31-2018

The problem of estimating an unknown discrete distribution from its samples is a fundamental tenet of statistical learning. Over the past decade, it attracted significant research effort and has been solved for a variety of divergence measures. Surprisingly, an equally important problem, estimating an unknown Markov chain from its samples, is still far from understood. We consider two problems related to the min-max risk (expected loss) of estimating an unknown k-state Markov chain from its n sequential samples: predicting the conditional distribution of the next sample with respect to the KL-divergence, and estimating the transition matrix with respect to a natural loss induced by KL or a more general f-divergence measure. For the first measure, we determine the min-max prediction risk to within a linear factor in the alphabet size, showing it is \Omega(k\log\log n/n) and O(k^2\log\log n/n). For the second, if the transition probabilities can be arbitrarily small, then only trivial uniform risk upper bounds can be derived. We therefore consider transition probabilities that are bounded away from zero, and resolve the problem for essentially all sufficiently smooth f-divergences, including KL-, L_2-, Chi-squared, Hellinger, and Alpha-divergences.

artificial intelligence, estimator, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.96)

Add feedback

On Learning Markov Chains

HAO, Yi, Orlitsky, Alon, Pichapati, Venkatadheeraj

Neural Information Processing SystemsDec-31-2018

The problem of estimating an unknown discrete distribution from its samples is a fundamental tenet of statistical learning. Over the past decade, it attracted significant research effort and has been solved for a variety of divergence measures. Surprisingly, an equally important problem, estimating an unknown Markov chain from its samples, is still far from understood. We consider two problems related to the min-max risk (expected loss) of estimating an unknown k-state Markov chain from its n sequential samples: predicting the conditional distribution of the next sample with respect to the KL-divergence, and estimating the transition matrix with respect to a natural loss induced by KL or a more general f-divergence measure. For the first measure, we determine the min-max prediction risk to within a linear factor in the alphabet size, showing it is \Omega(k\log\log n/n) and O(k^2\log\log n/n). For the second, if the transition probabilities can be arbitrarily small, then only trivial uniform risk upper bounds can be derived. We therefore consider transition probabilities that are bounded away from zero, and resolve the problem for essentially all sufficiently smooth f-divergences, including KL-, L_2-, Chi-squared, Hellinger, and Alpha-divergences.

artificial intelligence, estimator, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.97)

Add feedback

On Learning Markov Chains

Hao, Yi, Orlitsky, Alon, Pichapati, Venkatadheeraj

arXiv.org Machine LearningOct-27-2018

The problem of estimating an unknown discrete distribution from its samples is a fundamental tenet of statistical learning. Over the past decade, it attracted significant research effort and has been solved for a variety of divergence measures. Surprisingly, an equally important problem, estimating an unknown Markov chain from its samples, is still far from understood. We consider two problems related to the min-max risk (expected loss) of estimating an unknown $k$-state Markov chain from its $n$ sequential samples: predicting the conditional distribution of the next sample with respect to the KL-divergence, and estimating the transition matrix with respect to a natural loss induced by KL or a more general $f$-divergence measure. For the first measure, we determine the min-max prediction risk to within a linear factor in the alphabet size, showing it is $\Omega(k\log\log n\ / n)$ and $\mathcal{O}(k^2\log\log n\ / n)$. For the second, if the transition probabilities can be arbitrarily small, then only trivial uniform risk upper bounds can be derived. We therefore consider transition probabilities that are bounded away from zero, and resolve the problem for essentially all sufficiently smooth $f$-divergences, including KL-, $L_2$-, Chi-squared, Hellinger, and Alpha-divergences.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1810.11754

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Add feedback

Maxing and Ranking with Few Assumptions

Falahatgar, Moein, Hao, Yi, Orlitsky, Alon, Pichapati, Venkatadheeraj, Ravindrakumar, Vaishakh

Neural Information Processing SystemsDec-31-2017

PAC maximum selection (maxing) and ranking of $n$ elements via random pairwise comparisons have diverse applications and have been studied under many models and assumptions. With just one simple natural assumption: strong stochastic transitivity, we show that maxing can be performed with linearly many comparisons yet ranking requires quadratically many. With no assumptions at all, we show that for the Borda-score metric, maximum selection can be performed with linearly many comparisons and ranking can be performed with $\mathcal{O}(n\log n)$ comparisons.

algorithm, artificial intelligence, social media, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

The power of absolute discounting: all-dimensional distribution estimation

Falahatgar, Moein, Ohannessian, Mesrob I., Orlitsky, Alon, Pichapati, Venkatadheeraj

Neural Information Processing SystemsDec-31-2017

Categorical models are a natural fit for many problems. When learning the distribution ofcategories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting aconstant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively innatural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions. We show that (1) absolute discountingrecovers classical minimax KL-risk rates, (2) it is adaptive to an effective dimension rather than the true dimension, (3) it is strongly related to the Good-Turing estimator and inherits its competitive properties. We use powerlaw distributionsas the cornerstone of these results.

absolute discounting, artificial intelligence, renewable energy, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology: