AITopics | concrete distribution

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 02:51:55 GMT

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi

Theaccuracyofthe classifier trained thiswayisusually suboptimal because itisdifficulttolearn the true data distribution from the teacher.

artificial intelligence, kdgan, machine learning, (18 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Poland (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-20-2025, 21:41:04 GMT

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Knowledge distillation (KD) aims to train a lightweight classifier suitable to provide accurate inference with constrained resources in multi-label learning. Instead of directly consuming feature-label pairs, the classifier is trained by a teacher, i.e., a high-capacity model whose training may be resource-hungry. The accuracy of the classifier trained this way is usually suboptimal because it is difficult to learn the true data distribution from the teacher. An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game. However, it may take excessively long time for such a two-player game to reach equilibrium due to high-variance gradient updates.

classifier, generative adversarial network, knowledge distillation, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-16-2025, 17:16:02 GMT

d882050bb9eeba930974f596931be527-Supplemental.pdf

graph, parameterization, posterior, (16 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.94)

Esaki, Yasushi, Nakamura, Akihiro, Kawano, Keisuke, Tokuhisa, Ryoko, Kutsuna, Takuro

Accuracy-Preserving Calibration via Statistical Modeling on Probability Simplex

arXiv.org Machine LearningFeb-21-2024

Classification models based on deep neural networks (DNNs) must be calibrated to measure the reliability of predictions. Some recent calibration methods have employed a probabilistic model on the probability simplex. However, these calibration methods cannot preserve the accuracy of pre-trained models, even those with a high classification accuracy. We propose an accuracy-preserving calibration method using the Concrete distribution as the probabilistic model on the probability simplex. We theoretically prove that a DNN model trained on cross-entropy loss has optimality as the parameter of the Concrete distribution. We also propose an efficient method that synthetically generates samples for training probabilistic models on the probability simplex. We demonstrate that the proposed method can outperform previous methods in accuracy-preserving calibration tasks using benchmarks.

calibration, dataset, dnn model, (14 more...)

2402.13765

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland (0.04)
Europe > Spain (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Machine LearningNov-2-2022

Properties of the Concrete distribution

Chow, David D. K.

Like the more well-known Dirichlet distribution, it includes the uniform distribution on the simplex, but is otherwise distinct from the Dirichlet distribution. There is a temperature parameter, named by analogy with the Boltzmann (or Gibbs) distribution in thermodynamics. In the zero temperature limit, samples become concentrated at the corners of the simplex and the distribution approximates a categorical distribution: the limit of a continuous distribution approximates a discrete distribution, hence the portmanteau "Concrete". There are additionally K parameters that are easily interpreted as unnormalized probabilities for the K categories in this limit; a single constraint reduces these to K 1 independent parameters. The Concrete distribution has applications in machine learning for stochastic neural networks with discrete random variables, such as random sampling from discrete distributions and estimation of parameter gradients through backpropagation. One way of constructing the Concrete distribution is from combining Gumbel distributions through the softmax function, hence its alternative name of the Gumbel-softmax distribution. The differentiability of the softmax function, unlike the argmax function, allows for backpropagation.

artificial intelligence, concrete distribution, machine learning, (16 more...)

2211.01306

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)

arXiv.org Machine LearningJul-7-2020

In-Distribution Interpretability for Challenging Modalities

Heiß, Cosmas, Levie, Ron, Resnick, Cinjon, Kutyniok, Gitta, Bruna, Joan

It is widely recognized that the predictions of deep neural networks are difficult to parse relative to simpler approaches. However, the development of methods to investigate the mode of operation of such models has advanced rapidly in the past few years. Recent work introduced an intuitive framework which utilizes generative models to improve on the meaningfulness of such explanations. In this work, we display the flexibility of this method to interpret diverse and challenging modalities: music and physical simulations of urban environments.

artificial intelligence, in-distribution interpretability, machine learning, (17 more...)

2007.00758

Country:

North America > United States > New York (0.04)
Europe > Norway > Northern Norway > Troms > Tromsø (0.04)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Neural Information Processing SystemsFeb-14-2020, 06:28:12 GMT

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Wang, Xiaojie, Zhang, Rui, Sun, Yu, Qi, Jianzhong

Knowledge distillation (KD) aims to train a lightweight classifier suitable to provide accurate inference with constrained resources in multi-label learning. Instead of directly consuming feature-label pairs, the classifier is trained by a teacher, i.e., a high-capacity model whose training may be resource-hungry. The accuracy of the classifier trained this way is usually suboptimal because it is difficult to learn the true data distribution from the teacher. An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game. However, it may take excessively long time for such a two-player game to reach equilibrium due to high-variance gradient updates.

classifier, generative adversarial network, knowledge distillation, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

arXiv.org Machine LearningNov-21-2019

DBSN: Measuring Uncertainty through Bayesian Learning of Deep Neural Network Structures

Deng, Zhijie, Luo, Yucen, Zhu, Jun, Zhang, Bo

Bayesian neural networks (BNNs) introduce uncertainty estimation to deep networks by performing Bayesian inference on network weights. However, such models bring the challenges of inference, and further BNNs with weight uncertainty rarely achieve superior performance to standard models. In this paper, we investigate a new line of Bayesian deep learning by performing Bayesian reasoning on the structure of deep neural networks. Drawing inspiration from the neural architecture search, we define the network structure as gating weights on the redundant operations between computational nodes, and apply stochastic variational inference techniques to learn the structure distributions of networks. Empirically, the proposed method substantially surpasses the advanced deep neural networks across a range of classification and segmentation tasks. More importantly, our approach also preserves benefits of Bayesian principles, producing improved uncertainty estimation than the strong baselines including MC dropout and variational BNNs algorithms (e.g. noisy EK-FAC).

dbsn, network structure, neural network, (15 more...)

1911.09804

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.64)

Yasuda, Yusuke, Wang, Xin, Yamagishi, Junichi

Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment

arXiv.org Machine LearningOct-27-2019

Sequence-to-sequence text-to-speech (TTS) is dominated by soft-attention-based methods. Recently, hard-attention-based methods have been proposed to prevent fatal alignment errors, but their sampling method of discrete alignment is poorly investigated. This research investigates various combinations of sampling methods and probability distributions for alignment transition modeling in a hard-alignment-based sequence-to-sequence TTS method called SSNT-TTS. We clarify the common sampling methods of discrete variables including greedy search, beam search, and random sampling from a Bernoulli distribution in a more general way. Furthermore, we introduce the binary Concrete distribution to model discrete variables more properly. The results of a listening test shows that deterministic search is more preferable than stochastic search, and the binary Concrete distribution is robust with stochastic search for natural alignment transition.

alignment, concrete distribution, logistic noise, (12 more...)