Goto

Collaborating Authors

 insec


Appendixfor " Weakly-SupervisedMulti-GranularityMapLearningfor Vision-and-LanguageNavigation "

Neural Information Processing Systems

In our experiments, the fine-grained map, global semantic map, and multi-granularity map are of different sizes (asshowninFigure A)forsaving GPU memory. Object categories predicted by hallucination module. We use an Adam optimizer with a learning rate of 2.5e-4. Specifically,we consider the 10% area with 2 the highest probability in 2D distributionP and ˆP (as described in Section 3.3) as ground-truth andpredicted locations. From Table 1,this variant performs worse than our agent.



cf78a15772ec1a6aee9bbee2d2b382c3-Supplemental-Conference.pdf

Neural Information Processing Systems

Our first step is to prove the parameterization (Eq. 3) provides local attention after the Note that the weight and bias terms in theaboveformulation (Eq. Assume the position-based function at each head is learned to perform'hard attention' on one of its surrounding positions,i.e., an extreme semi-dynamic attention. To demonstrate this phenomenon, we plot and compare the impacts ofΦc and Φp6 on Φa in the middle and right of Fig. S4 and visualize learned position-based attentionΦp of iRPE in Fig. S5. As seen from Tab. S17, there exist noticeable performance gaps between the models (b, f, g, h) (withoutΦp)and(a,d,e,i)(withΦp). Without adaptiveattention (model (c)),Φp imposes stronger locality onevery layer.


UnsupervisedShapeMatching

Neural Information Processing Systems

Following the unsupervised literature [4, 3, 5], the siamese networkFθ is trained by imposing structural properties on the fmapC such as bijectivity and orthogonality on the shape pairs in the training set.


fea16e782bc1b1240e4b3c797012e289-AuthorFeedback.pdf

Neural Information Processing Systems

Notethat(moreaccurate)OvAmethods9 requireO(d)classifiers to be trained (taking many hours). Sampling a group testing matrix that (a) captures the label17 correlations, (b) has distinctive columns, and (c) satisfies the SAFFRON construction, is non-trivial. Weakness3-Experimental23 study: We first show that NMFGT is better (See Fig 2. & suppl.) We27 believe that low training times (saving many hours) and fast predictions in return for a limited loss (few points) in28 accuracywillbecriticalinmany"relatedsearch"applications. Indeed, we notice aclear trade-off: as we increase runtimes, accuracyimproves.


e43739bba7cdb577e9e3e4e42447f5a5-AuthorFeedback.pdf

Neural Information Processing Systems

R3 brings up an important question whether BetaE is fully36 expressive and can model any given query-answer pairs on a KG. This is a challenging problem and our ongoing37 researchconcern.


6174c67b136621f3f2e4a6b1d3286f6b-Supplemental-Conference.pdf

Neural Information Processing Systems

We first discuss the broader impact of the proposed DynamicD inSec. D presents the training dynamics for the further analysis. E also conducts qualitative experiments to verify whether our approach memorizes the real images for extremely limited data. F shows the hyper-parameter analysis. It demonstrates the importance of discriminator in the two-player competition as simply adjusting the capacity could lead tosuch significant improvements on avarietyof settings, making training generative models more accessible to everyone.




SupplementaryMaterialfor3DConceptGrounding onNeuralFields

Neural Information Processing Systems

To enable communication between points at lower layers, we also add pooling and expansion layers between the ResNet-blocks. The encoder is a bidirectional LSTM [1]. The decoder is asimilar LSTM that generates avector from the previous token ofthe output sequence. In general, the whole training process is split into 3 stages.