question-answer pair
- North America > United States > California (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Asia > Middle East > Jordan (0.04)
- (6 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks > Manufacturer (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
- Africa > Nigeria (0.14)
- North America > Canada (0.14)
- Africa > Kenya (0.14)
- (22 more...)
- Research Report (0.67)
- Workflow (0.46)
- Media > News (1.00)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Health & Medicine > Epidemiology (0.68)
- Africa > Nigeria (0.14)
- Africa > Kenya (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (38 more...)
- Media > News (1.00)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- (4 more...)
- North America > United States (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (2 more...)
Checklist 1. For all authors (a)
Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments (e.g. for benchmarks)... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] See A.2 (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) For a detailed description and intended uses, please refer to 1. A.2 Dataset Accessibility We plan to host and maintain this dataset on HuggingFace. A.4 Dataset Examples Example question-answer pairs are provided in Tables 9 10 11, . Example Question "What does the symbol mean in Equation 1?" Answer "The symbol in Equation 1 represents "follows this distribution". "Can you provide more information about what is meant by'generative process in "The generative process refers to Eq. (2), which is a conceptual equation representing Question "How does the DeepMoD method differ from what is written in/after Eq 3?" Answer "We add noise only to Question "How to do the adaptive attack based on Eq.(16)? "By Maximizing the loss in Eq (16) using an iterative method such as PGD on the end-to-end model we attempt to maximize the loss to cause misclassification while Question "How does the proposed method handle the imputed reward?" "The proposed method uses the imputed reward in the second part of Equation 1, "Table 2 is used to provide a comparison of the computational complexity of the "Optimal number of clusters affected by the number of classes or similarity between "The authors have addressed this concern by including a new experiment in Table 4 of Question "Can you clarify the values represented in Table 1?" Answer "The values in Table 1 represent the number of evasions, which shows the attack "The experiments in table 1 do not seem to favor the proposed method much; softmax Can the authors explain why this might be the case?" Answer "The proposed method reduces to empirical risk minimization with a proper loss, and However, the authors hope that addressing concerns about the method's theoretical Question "Does the first row of Table 2 correspond to the offline method?"
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- North America > United States (0.04)
- (3 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (15 more...)
- Europe > Sweden > Skåne County > Malmö (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
Supplementary Material RE
D.3 Open source performance on mini test set . . . . . . . . . . . . . . . . . . . . . A.1 V ersion 2 We have fixed some bugs in the evaluation code, resulting in slight differences compared to the previous release. The issue was that 149 samples were not evaluated in the previous version, and these have now been included in the new update. A.2 V ersion 3 We have clarified certain statements and added experimental results to address the reviewer's questions. B.1 Limitations Despite these advancements, our dataset does exhibit certain limitations, largely stemming from inherited biases from the source datasets: Currently, we only address scenarios where both the question and the answer span a single time duration. Given a question, the annotated time span must be a single, continuous duration, which might be limiting for all scenes. The presence of noisy or inaccurate annotations in the source datasets, including captions and timestamps, poses a challenge. Despite our efforts, some of these errors could not be automatically filtered out. The extent of this issue is detailed in the qualitative visualization conducted by our human reviewers, as presented in supplementary. The average duration of ground truth events in our dataset is relatively long. This characteristic has the unintended consequence of hindering the models' ability to detect and analyze fine-grained actions within shorter video segments. These drawbacks highlight areas for potential improvement and indicate the necessity for ongoing refinement to ensure the creation of more accurate and unbiased video language models. B.2 Social Impact Though we provide an assessment of temporal reasoning and moment localization, the types and scene diversity are still limited. We inherit the video classes from the two source video datasets, which may not be sufficient for a comprehensive assessment of all kinds of temporal reasoning. This limitation could introduce a bias. For both curated data and video data, they do not contain any personally identifiable information. Besides, some of the video samples in the source datasets might be slightly uncomfortable depending on the viewer. For example, some videos discuss tattoos and piercings, and some of them present news about social events including demonstrations or war reports. However, we only release the data of curated question-answer and time span.
- North America > United States (0.04)
- Asia > Taiwan (0.04)
- Law (1.00)
- Information Technology (0.93)
- North America > United States > Massachusetts (0.05)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (6 more...)