Materials around us, processes and occasions we are witnessing, datasets we have are all our observations and there are generative models in nature that we can not observe but generate these observations. Today's topic is deep latent factor models which are the statistical models that try to explain the high dimensional complex datas with low dimensional (hopefully interpretable) factors. In deep latent factor models we assume that there is a complex function which accepts few underlying factors as input and generates the complex outputs which we observe. One may has various motivations to use latent factor models. Interpretability is clearly one of them.

Kim, Jangho, Park, Seonguk, Kwak, Nojun

Many researchers have sought ways of model compression to reduce the size of a deep neural network (DNN) with minimal performance degradation in order to use DNNs in embedded systems. Among the model compression methods, a method called knowledge transfer is to train a student network with a stronger teacher network. In this paper, we propose a novel knowledge transfer method which uses convolutional operations to paraphrase teacher's knowledge and to translate it for the student. This is done by two convolutional modules, which are called a paraphraser and a translator. The paraphraser is trained in an unsupervised manner to extract the teacher factors which are defined as paraphrased information of the teacher network.

Maziarka, Łukasz, Nowak, Aleksandra, Wołczyk, Maciej, Bedychaj, Andrzej

One of the main arguments behind studying disentangled representations is the assumption that they can be easily reused in different tasks. At the same time finding a joint, adaptable representation of data is one of the key challenges in the multi-task learning setting. In this paper, we take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter sharing. We perform a thorough empirical study of the representations obtained by neural networks trained on automatically generated supervised tasks. Using a set of standard metrics we show that disentanglement appears naturally during the process of multi-task neural network training.

Wang, Yuyang, Smola, Alex, Maddix, Danielle C., Gasthaus, Jan, Foster, Dean, Januschowski, Tim

Producing probabilistic forecasts for large collections of similar and/or dependent time series is a practically relevant and challenging task. Classical time series models fail to capture complex patterns in the data, and multivariate techniques struggle to scale to large problem sizes. Their reliance on strong structural assumptions makes them data-efficient, and allows them to provide uncertainty estimates. The converse is true for models based on deep neural networks, which can learn complex patterns and dependencies given enough data. In this paper, we propose a hybrid model that incorporates the benefits of both approaches. Our new method is data-driven and scalable via a latent, global, deep component. It also handles uncertainty through a local classical model. We provide both theoretical and empirical evidence for the soundness of our approach through a necessary and sufficient decomposition of exchangeable time series into a global and a local part. Our experiments demonstrate the advantages of our model both in term of data efficiency, accuracy and computational complexity.

As discussed in a previous post on the principal component method of factor analysis, the hat{Psi} term in the estimated covariance matrix S, S hat{Lambda} hat{Lambda}' hat{Psi}, was excluded and we proceeded directly to factoring S and R. The principal factor method of factor analysis (also called the principal axis method) finds an initial estimate of hat{Psi} and factors S – hat{Psi}, or R – hat{Psi} for the correlation matrix. Therefore the principal factor method begins with eigenvalues and eigenvectors of S – hat{Psi} or R – hat{Psi}. In the case of S – hat{Psi}, the above is multiplied by the variance of the respective variable. The factor loadings are then calculated by finding the eigenvalues and eigenvectors of the R – hat{Psi} or S – hat{Psi} matrix. We will perform factor analysis using the principal factor method on the rootstock data as done previously with the principal component method to see if the approaches differ significantly.