Plotting



8744cf92c88433f8cb04a02e6db69a0d-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for the detailed and insightful reviews. As the reviewers noted, our work 1) contributes to "a Thank you for the valuable feedback on this section -- we will incorporate this in our next revision. The intuition for the proof of Theorem 3.3 is that the optimization problem is convex over the space of probability By weak regularization, we refer to the fact that ฮป 0 for our Theorem 4.1 to hold. The difficulty with ReLU networks is that if the gradient flow pushes neurons towards 0, issues of differentiability arise. One potential approach to circumvent this issue is arguing that with correct initialization, the iterates will never reach 0. This is an interesting direction for future work and we thank the reviewer for this suggestion.



873be0705c80679f2c71fbf4d872df59-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their constructive comments. We address the main concerns below. In our implementation, it was crucial to use the improvements from Sec. 3.4. We ran the "positive response" version of ApproPO (Algorithm 5) for 2000 outer-loop iterations (i.e., 2000 updates of ฮป), but needed to make at most 61 RL Note that the policy mixture returned by ApproPO is just a weighted combination of the policies from cache. We will add this discussion to the paper and also update plots, so they are in terms of transitions rather than trajectories.



86e8f7ab32cfd12577bc2619bc635690-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their valuable comments, and are happy to see feedback such as "concepts presented The reviewers agree that NOX maps are "definitely novel We answer questions, address factual errors, and present more details to improve our manuscript. First, we address R1's questions and concerns. The differences in chairs between Table 4 and 5 are due to different experimental setting. "Fixed Multi" models were trained with 2, 3, or 5 views respectively. R2 questions the claims about feature-level aggregation in Tables 2 and 3. Our main claim is that adding more views We will add this detail in the paper.


Multiple Futures Prediction

Neural Information Processing Systems

Temporal prediction is critical for making intelligent and robust decisions in complex dynamic environments. Motion prediction needs to model the inherently uncertain future which often contains multiple potential outcomes, due to multiagent interactions and the latent goals of others. Towards these goals, we introduce a probabilistic framework that efficiently learns latent variables to jointly model the multi-step future motions of agents in a scene. Our framework is data-driven and learns semantically meaningful latent variables to represent the multimodal future, without requiring explicit labels. Using a dynamic attention-based state encoder, we learn to encode the past as well as the future interactions among agents, efficiently scaling to any number of agents. Finally, our model can be used for planning via computing a conditional probability density over the trajectories of other agents given a hypothetical rollout of the'self' agent. We demonstrate our algorithms by predicting vehicle trajectories of both simulated and real data, demonstrating the state-of-the-art results on several vehicle trajectory datasets.


here we compare MFP directly to PRECOG[A] on their released CARLA data in Tab. 1. MFP significantly outperforms previous SOTA in [A]

Neural Information Processing Systems

We thank the reviewers for valuable feedback and will make the suggested changes. We've included additional experiments to address the C provides additional results on non-vehicle classes (i.e. We also quantitatively evaluated hypothetical inference in Tab. 2. We report new results using the minMSD B, we created a CARLA-based RL env. We compared it with several SOTA model-free methods, demonstrating faster training and leading to a safer or more robust policy. Reviewer 6: We will release code in the near future and make the suggested clarifications.


Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Neural Information Processing Systems

We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. Our neural-symbolic visual question answering (NS-VQA) system first recovers a structural scene representation from the image and a program trace from the question. It then executes the program on the scene representation to obtain an answer. Incorporating symbolic structure as prior knowledge offers three unique advantages. First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset. Second, the model is more data-and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering. Third, symbolic program execution offers full transparency to the reasoning process; we are thus able to interpret and diagnose each execution step.