Goto

Collaborating Authors

 license


Claim a full MS Office suite license with no subscription before this 30 deal ends

PCWorld

When you purchase through links in our articles, we may earn a small commission. With a sale ending today, May 31 at 11:59pm Pacific, you can own a Windows lifetime license to Microsoft Office for $29.97, with no subscription required. Tired of paying monthly just to access your everyday work tools? Microsoft Office Professional 2021 gives you the core apps -- Word, Excel, PowerPoint, Outlook, and more -- in a one-time purchase for $29.97 through May 31. This version is a solid choice for freelancers, business owners, students, and anyone who wants to create polished documents, presentations, and spreadsheets without juggling subscriptions or relying on cloud-only tools.





MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing Supplementary Material

Neural Information Processing Systems

VLMEvaluation To evaluate two VLMs (Frozen in Time [1] and VideoCLIP [13]), we use a hybrid approach that leverages both prototypical networks [11] and the video-language similarity metrics learned by both models. Below, we show an ablation study where we use only the video prototype networks. We show the performance of using only language similarity in the few-shot case to demonstrate the effects of sample removal, and we also show the effects of our hybrid weighting scheme, where we weight the language embeddings five times more than the video embeddings when constructing the hybrid prototype (as opposed to equal weighting during the regular hybrid approach). Although we perform our ablation study with Frozen-in-Time, and use the same weighting scheme and prototype strategy for VideoCLIP as well. For this study, we show activity and sub-activity classification accuracy in the 5-shot case. We visualize whether a given method uses language, video, or both to create its prototype embeddings.


materials

Neural Information Processing Systems

A.1 Access instructions OpenProteinSet is hosted by the Registry of Open Data on AWS (RODA) and can be accessed at the following link: registry.opendata.aws/openfold/. A.2 Documentation and intended uses We include a datasheet [1] in Section B. Detailed documentation on the precise structure and content of the dataset is provided on the dataset's landing page. A.3 Data format All OpenProteinSet files are in standard plaintext formats (A3M for MSAs, HHSearch format for template hits, and PDB for structure files) that can be read by a wide variety of bioinformatics software. A.5 License OpenProteinSet is made available under the CCBY 4.0 license. A copy of the license is provided with the dataset.


1102a326d5f7c9e04fc3c89d0ede88c9-Supplemental.pdf

Neural Information Processing Systems

This is the distribution over datasets one obtains by first sampling a task t from Pt, and then sampling a dataset S from Pmz|t. Here p(S) corresponds to the marginal distribution over datasets S. Note that the last line above holds because E P f(,S) does not depend on t. Thus, in this section, we present a specialization of the bound for Gaussian distributions. Let P have mean ยต and covariance; thus P = N(ยต,) and analogously P,0 = N(ยต0, 0). We can then apply the analytical form for the KL-divergence between two multivariate Gaussian distributions to the bound presented in Theorem 3. The result is the following bound holding under the same assumptions as Theorem 3: L(P,Pt) 1 l We implement the above bound in code instead of the non-specialized form of the KL divergence to speed up computations and simplify gradient computations. A.3.2 Few-Shot Learning Bound with Validation Data In this section, we will assume that, in addition to the training data S Pmz|t, we have access to validation data Sva Pnz|t at meta-training time. We will show that a meta-learning generalization bound can still be obtained in this case.


Appendix and Training Specification

Neural Information Processing Systems

In all environments, we use a Transformer architecture with four layers and four self-attention heads. The total input vocabulary of the model is V (N + M +2) to account for states, actions, rewards, and rewards-to-go, but the output linear layer produces logits only over a vocabulary of size V; output tokens can be interpreted unambiguously because their offset is uniquely determined by that of the previous input. The dimension of each token embedding is 128. Dropout is applied at the end of each block with probability 0.1. We follow the learning rate scheduling of (Radford et al., 2018), increasing linearly from 0 to 2.5 10 4 over the course of 2000 updates.