In this post we will be looking into the paper "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization"(AdaIN) by Huang et. We are looking this paper because it had some key advantages over the other state-of-the-art methods at the time or release. Most important of all, this method, once trained, can be used to transfer style between any arbitrary content-style image pair, even ones not seen during training. While the method proposed by Gatys et. The AdaIN method is also flexible, it allows for control over the strength of the transferred style in the stylised image and also allows for extensions such as style interpolation and spatial controls.
Mukhoti, Jishnu, Dokania, Puneet K., Torr, Philip H. S., Gal, Yarin
We study batch normalisation in the context of variational inference methods in Bayesian neural networks, such as mean-field or MC Dropout. We show that batch-normalisation does not affect the optimum of the evidence lower bound (ELBO). Furthermore, we study the Monte Carlo Batch Normalisation (MCBN) algorithm, proposed as an approximate inference technique parallel to MC Dropout, and show that for larger batch sizes, MCBN fails to capture epistemic uncertainty. Finally, we provide insights into what is required to fix this failure, namely having to view the mini-batch size as a variational parameter in MCBN. We comment on the asymptotics of the ELBO with respect to this variational parameter, showing that as dataset size increases towards infinity, the batch-size must increase towards infinity as well for MCBN to be a valid approximate inference technique.
The machine learning community has witnessed a surge in releases of frameworks, libraries and software. Tech pioneers like Google, Amazon, Microsoft and others have insisted their intention behind open-sourcing their technology. However, there has been a growing trend of these tech giants claiming ownership for their innovations. According to the National Bureau of Economic Research study, in 2010, there were 145 US patent filings that mentioned machine learning, compared to 594 in 2016. Google, especially, has filed patents related to machine learning and neural networks 99 times in 2016 alone.
The Transformer Architecture [1] introduced by Vaswani et al, is based on attention mechanism and overcomes the challenges faced in recurrence. In continuation to the last blog'Let's pay some Attention!', Let's do a quick recap of the attention mechanism we understood in the last blog. We have some key - value pairs and a query. We compare the query to each key and the key with the highest similarity score is assigned the highest weight.
SGD, in its base form, is not optimized for batches. It's designed with one sample each time in mind. Batch Gradient Descent is basically Stochastic Gradient Descent but optimized for batches, with the right kind of weighing and normalisation. In most DL frameworks there are two versions of GD - Stochastic and Batch, under the same name (SGD), and the framework chooses which one to use based on the batch size you declare.