kernelsize
c39e1a03859f9ee215bc49131d0caf33-Supplemental.pdf
Additionally, we show generalization performance of our proposed method across differentvisualdomains. Withthegiven problemcategory(task),asubsetforlearning can be sampled (via domain episode module in Figure 4 in main text). Here, by replacingclass with task, K-shot andN-task reasoning framework can be defined. Here, we show analogical learning with the existing meta learning framework for fast adaptation fromthesourcedomain tothetargetdomain.
Appendixfor " Weakly-SupervisedMulti-GranularityMapLearningfor Vision-and-LanguageNavigation "
In our experiments, the fine-grained map, global semantic map, and multi-granularity map are of different sizes (asshowninFigure A)forsaving GPU memory. Object categories predicted by hallucination module. We use an Adam optimizer with a learning rate of 2.5e-4. Specifically,we consider the 10% area with 2 the highest probability in 2D distributionP and ˆP (as described in Section 3.3) as ground-truth andpredicted locations. From Table 1,this variant performs worse than our agent.
ExpandNets: LinearOver-parameterization toTrainCompactConvolutionalNetworks-SupplementaryMaterial-AComplementaryExperiments
However,withdeep networks, initialization can have an important effect on the final results. While designing an initialization strategy specifically for compact networks is an unexplored research direction, our ExpandNets can be initialized in a natural manner. Note that this strategy yields an additional accuracy boost to our approach. Theoutput ofthelastlayer ispassed through afully-connected layer with 64 units, followed by a logit layer with either 10 or 100 units. Weusedstandard stochastic gradient descent (SGD) withamomentum of0.9 and a learning rate of0.01, divided by10 at epochs 50 and 100.
Hamiltonian prior to Disentangle Content and Motion in Image Sequences
The ability to learn to generate artificial image sequences has diverse uses, from animation, key frame generation, summarisation to restoration and has been explored in previous work over many decades (Hogg, 1983; Hurri and Hyvärinen, 2003; Cremers and Yuille, 2003; Storkey and Williams, 2003; Kannan et al., 2005). However, learning to generate arbitrary sequences is not enough; to provide useful value, the user must be able to control aspects of the sequence generation, such as the motion being enacted, or the characteristics of the agent doing an action. To enable this, we must learn to decompose image sequences into content and motion characteristics such that we can apply learnt motions to new objects or vary the types of motions being applied. Deep generative models (DGMs) such as variational autoencoders (VAEs) (Kingma and Welling, 2013) and Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) use neural networks (NNs) to transform the samples from a prior distribution over lower-dimensional latent factors to samples from the data distribution itself. Recent developments (Chung et al., 2015; Srivastava et al., 2015; Hsu et al., 2017; Yingzhen and Mandt, 2018) extend VAEs to sequences using Recurrent Neural Networks (RNNs) on the representation of temporal frames.