In this post, I will implement some of the most common loss functions for image segmentation in Keras/TensorFlow. I will only consider the case of two classes (i.e. In this post, I will always assume that tf.keras.layers.Sigmoid() is not applied (or only during prediction). Weighted cross entropy (WCE) is a variant of CE where all positive examples get weighted by some coefficient. It is used in the case of class imbalance.
We know in neural networks, neurons work with corresponding weight, bias and their respective activation functions. The weights get multiplied with the inputs and then activation function is applied to the element before going to the next layer. Finally, we get the predicted value (yhat) through the output layer. But prediction is always closer to the actual (y), which we term as errors. So, we define the loss/cost functions to capture the errors and try to optimize it though backpropagation.
When introduced to machine learning, practically oriented textbooks and online courses focus on two major loss functions, the squared error for regression tasks and cross entropy for classification tasks, usually with no justification for why these two are important. Before we dive into why we might be interested in these loss functions, let's ensure that we're on the same page and quickly recall how they are defined. To explain why these two losses achieve what we want, we first need to agree on what exactly it is that we want to achieve. Let's consider a running regression example. In this case we're trying to estimate the value of a variable, which for instance could be the number of active Twitter users worldwide in given quarter: We assume here that there is a true answer, meaning that there is a distribution which will accurately model the number of Twitter users throughout all time.
Machine learning has attracted interests from various fields as a powerful tool in finding patterns in data. Supported by machine learning technology, computer programs can improve automatically through experience, which has enabled a wide spectrum of applications: from visual and speech recognition, effective web search, to study of human genomics [1, 2]. Classical machine learning techniques have also found many interesting applications in different disciplines of quantum physics [3, 4, 5, 6, 7, 8, 9, 10]. With the advancement of quantum information science and technology, there are both theoretical and practical interests in understanding quantum systems, building quantum devices, developing quantum algorithms, and ultimately, taking advantages of quantum supremacy [11, 12].