Negative Log Likelihood Ratio Loss for Deep Neural Network Classification

Zhu, Donglai, Yao, Hengshuai, Jiang, Bei, Yu, Peng

arXiv.org Machine Learning 

Deep neural network (DNN) has achieved remarkable success in classification tasks such as image classification [1]. The network output can mimic the posterior probabilities of target classes for the input observation when the nonlinear activation function in the output layer is defined as a soft-max function [2]. The learning objective is to minimize the difference between the predicted distribution and the true datagenerating distribution. In information theory, the cross entropy between two probability distributions over a common event set of events measures the average number of bits needed to identify an event if coding follows a learned probability distribution rather than the true but unknow distribution [3]. Therefore, cross entropy is a reasonable loss function for the DNN-based classification. However, in practice the true data-generating probability distribution is unknown and replaced by the empirical probability distribution over a training set where each sample is drawn independently and identically distributed (i.i.d.) from the data space [4]. Under assumptions of uniform distributions of feature and label spaces, minimizing cross-entropy is equivalent to maximum likelihood, i.e., the learning problem aims to maximize likelihood of correct class for each of training samples [2]. Maximum likelihood is a generative training criterion by which the model learns the likelihood of correct class for the observation. The model makes predictions by using Bayes rules to calculate posterior probabilities of target classes for the observation and then select the most likely class.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found